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Abstract 

The goal of the present paper is the derivation of a framework for the finite-length 
analysis of message-passing iterative decoding of low-density parity-check codes. To this 
end we introduce the concept of graph-cover decoding. Whereas in maximum-likelihood 
decoding all codewords in a code are competing to be the best explanation of the received 
vector, under graph-cover decoding all codewords in all hnite covers of a Tanner graph 
representation of the code are competing to be the best explanation. 

We are interested in graph-cover decoding because it is a theoretical tool that can 
be used to show connections between linear programming decoding and message-passing 
iterative decoding. Namely, on the one hand it turns out that graph-cover decoding is 
essentially equivalent to linear programming decoding. On the other hand, because itera¬ 
tive, locally operating decoding algorithms like message-passing iterative decoding cannot 
distinguish the underlying Tanner graph from any covering graph, graph-cover decoding 
can serve as a model to explain the behavior of message-passing iterative decoding. 

Understanding the behavior of graph-cover decoding is tantamount to understanding 
the so-called fundamental polytope. Therefore, we give some characterizations of this 
polytope and explain its relation to earlier concepts that were introduced to understand 
the behavior of message-passing iterative decoding for finite-length codes. 

Index Terms: Graph-cover decoding, iterative decoding, message-passing algorithms, linear pro¬ 
gramming decoding, fundamental polytope, fundamental cone, pseudo-codewords, minimal pseudo¬ 
codewords, pseudo-weight. 
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1 Introduction 


Low-density parity-check (LDPC) codes were introduced by Gallager [1, 2], As important 
as the codes themselves was also a class of decoding algorithms that he presented. These 
algorithms had two common features. Firstly, based on the observed channel output, these 
algorithms tried to iteratively find the codeword that was sent over the channel. Secondly, 
these algorithms operated locally in the sense that they combined partial information that 
could then be used in other partial-information combining. 

Although revolutionary, these codes and decoding algorithms were forgotten for a long 
time. The main reason being that, although these algorithms were computationally far less 
demanding than maximum a-posterior decoding (MAPD) and maximum-likelihood decoding 
(MLD), they were nevertheless too complex for that time. Besides some work by Zyablov [3], 
Zyablov and Pinsker [4], Tanner [5], and Margulis [6], Gallager’s ideas lay dormant for about 
30 years. Then, in the mid-1990’s, the discovery of turbo codes by Berrou, Glavieux, and 
Thitimajshima [7], the rediscovery of LDPG codes by MacKay and Neal [8, 9, 10], and the work 
of Wiberg, Loeliger, and Koetter [11, 12] on codes on graphs and message-passing iterative 
decoding (MPID) initiated a flurry of research on iterative decoders and codes amenable 
to such decoders that continues to these days. They lead to new and practical approaches 
not only in communications but also in signal processing and artifical intelligence. Many of 
these developments can be explained nowadays with the help of concepts like the generalized 
distributive law as formulated by Aji and McEliece [13] or factor graphs and the sum-product 
algorithm (SPA) by Kschischang, Frey, and Loeliger [14, 15]. 

While MPID has had unparalleled success, it is fair to say that its behavior for the 
case of finite-length codes is, at present, not well understood and many results are based 
on simulations alone. Before delineating what is known about the finite-length case, let us 
however hrst turn to the infinite-length case. For LDPC codes with block length going to 
infinity (where it is assumed that the length of the smallest cycle in the underlying Tanner 
graph also goes to infinity, or where at least the fraction of finite-length cycles vanishes) 
it turned out that there is an elegant analysis technique, the so-called density evolution: 
this technique was first introduced by Luby et al. [16] for the binary erasure channel and 
then by Richardson, Shokrollahi, and Urbanke [17, 18] for more general channels. These 
results were very valuable in guiding code designers how to tweak LDPC codes into well¬ 
performing (finite-length) irregular LDPC codes. There are, however, some drawbacks of 
these techniques: firstly, it is not clear, if these results give the best finite-length irregular 
codes, and secondly, and more importantly, they do not say if a specihc code exhibits an error 
floor and if yes, where this error floor is. 

Early techniques that tried to tackle the finite-length case focused on specific families of 
codes and/or restricted classes of channels. In that direction, let us mention the analysis 
of so-called cycle codes^ by Wiberg [12], tail-biting trellises and graphs with a single cycle 
by Anderson and Hladik [19], by Aji et al. [20], and by Forney et al. [21]. For the binary 
erasure channel, influential work was done by Di et al. [22] utilizing the notion of stopping sets. 
Finally, for more general channels, the idea of near-codewords, trapping sets, extrinsic message 
degree (EMD), and instantons were used by MacKay and Postol [23], by Richardson [24], by 
Tian et al. [25, 26], and by Chernyak et al. [27, 28], respectively, to empirically characterize 
problematic situations for MPID. 

^ Cycle codes are codes with a Tanner graph where all bit nodes have degree two. 
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A complete understanding of MPID of finite-length codes with finitely many iterations is 
essentially given by computation trees [12], i.e. by the valid configurations of such computation 
trees. Some work on analyzing computation trees was done by Wiberg [12], with subsequent 
work by Frey et al. [29] and Forney et al. [30]. Although this approach is intuitively very 
appealing, it seems to be very difficult to get a simple characterization of the valid configu¬ 
rations on computation trees, a necessary requirement if one wants to understand MPID. In 
fact, only extremely simple codes were analyzed with this technique so far. 

Experimental results for codes of reasonable length and rate show that decision boundaries 
can be of a rather complex nature, a fact that makes the above-mentioned problems in 
trying to analyze the valid configurations on computation trees not completely unexpected. 
A complete nnderstanding of MPID of a given code is probably an illusionary task, therefore 
we will settle here for a more modest goal. 

In this paper we present an analysis technique for MPID of a given code. Although the 
underlying principle of our analysis technique is very simple, experimentally it seems to give 
very good predictions of the decoding behavior; in fact, it gives the correct answers for all the 
cases where MPID behavior is understood analytically. The predicted decision boundaries 
are hyperplanes in the log-likelihood ratio vector space and it turns out that the decision 
boundaries are exactly the same as the ones under so-called linear programming decoding 
(LPD) that was recently introduced by Feldman, Wainwright, and Karger [31, 32]. In the 
light of this coincidence one might actually argue that the various MPID algorithms are 
nothing else than low-complexity, very efficient, and aggressive LP solvers that most of the 
time “decide” for the same (psendo-)codeword as LPD, but not always.^ We have done some 
work towards showing the nearness of min-sum algorithm (MSA) decoding and LPD [33] but 
in this paper we will not discuss this aspect any further. 

The analysis technique that was mentioned in the previous paragraph will be called graph- 
cover decoding (GCD): its name stems from the fact that during GCD all codewords in all 
finite covers of a given Tanner graph are competing to be the best explanation of the received 
vector. Analyzing all the codes in all the finite covers seems at first to be an infeasible task. 
However, it turns out that they can be characterized by the so-called fundamental polytope. 
Among other things, we will see in this paper how this fundamental polytope unifies the 
notions of stopping sets, pseudo-codewords, near-codewords, and trapping sets.^ 

The outline of this paper is as follows. In Sec. 1.1 we will discuss the iterative decoding of 
a simple code and show the underlying philosophy behind onr analysis technique. After some 
notational remarks in Sec. 1.2, the main part of the paper starts in Sec. 2 which introduces 
graph covers and the fundamental polytope. In Sec. 3 we review MAPD/MLD of codes and 
by considering relaxations of optimization problems we make the link to LPD. Then, in Sec. 4 
we will show that GGD is essentially eqnivalent to LPD and we will see how GGD can be 
seen as a model for MPID. Whereas Sec. 5 will discuss various descriptions and properties 
of the fundamental polytope and cone. Sec. 6 will focus on a variety of pseudo-weights and 
their properties. A simple upper bound on the AWGNG pseudo-weight will be presented in 
Sec. 7 which implies a sub-linear asymptotic behavior of the AWGNG psendo-weight for any 
family of regular LDPG codes (under some mild conditions). Finally, in Sec. 8 we explain 
the relationship of GGD to other concepts that have been used in the past to explain the 

^When LPD decides for a pseudo-codeword that is not a codeword, the dynamical behavior of MPID 
depends very much on the type of the MPID under consideration. 

^For more references on these topics, see also [34]. 
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Figure 1: 


Tanner graph T of the length-3 code under consideration. 


hnite-length behavior of MPID, and in Sec. 9 we offer some conclusions and mention some 
open problems. 


1.1 Motivating Example 

Because we are using binary codes, we can without loss of optimality assume that a decoding 
algorithm bases its decision on the log-likelihood ratio (LLR) vector which is given by the 
observed channel output sequence. The understanding of a particular decoding algorithm is 
then tightly related to the understanding the decision regions in the space of LLR vectors. 
While the visualization of decision regions is a very intuitive way of showing how a decoder 
works (and of showing differences between different decoders), it is usually infeasible to show 
all the aspects of the decision regions since practical codes have a length of several tens of bits 
to several ten thousands of bits which implies that the space of LLR vectors has a dimension 
of several tens to several ten thousands. 

However, some of the key differences between MAPD/MLD and iterative decoding can 
already be seen for very short codes. The aim of this section is to discuss such a very short 
code and to introduce an approximate analysis based on graph covers that explains the main 
characteristics of the decision regions of iterative decoding like sum-product algorithm (SPA) 
and the min-sum algorithm algorithm (MSA) decoding. (Note that the notation that we will 
use in this subsection will be properly introduced in Sec. 1.2 and in later sections.) 

We consider a code C of length n = 3 defined by the parity-check matrix 

/I 1 0\ 

h 4 |i J jj > (1) 


whose Tanner graph T = T(H) is depicted in Fig. 1. Because H has rank 3, the dimension 
of the code is 0 and therefore C contains only one codeword: 


C = |(X 1 ,X 2 ,X 3 ) G 


{xi,X2.,X‘i) • = o| = {(0,0,0)}. 


While it, at first, may seem strange to consider a zero-rate code, it is indeed an ideal candidate 
to investigate problematic behaviors of iterative decoding. Assume that we are using the code 
for data transmission over an additive white Gaussian noise channel (AWGNC) and that the 
LLR vector is A = (Ai, A 2 , A 3 ). 

Gonsider first block-wise MAPD (which is equivalent to block-wise MLD since we assume 
that all codewords are transmitted equally likely). It is immediately apparent that for such 
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Figure 2: SPA decoding with maximally 60 iterations of the code C that is represented 
by the Tanner graph T in Fig. 1. The gray-scale indicates after how many iterations the 
algorithm converged to the all-zeros codeword with the implication that in the black region 
the decoder did not converge (see text for more details). From left to right: (Ai, A 3 )-plane for 
A 2 = -2.5, 0, -F2.5. 

a decoder there is only one decision region: we decide x = (0,0,0) independently of A.^ 

We now turn to MPID, more precisely decoding based on the SPA and MSA [14] where 
one iteration consists in updating the messages at all variable nodes and then updating 
the messages at all check nodes. The SPA decoding convergence behavior as a function of 
(Ai,A 2 ,A 3 ) is depicted in Fig. 2: the gray-scale indicates after how many iterations the SPA 
converged to the all-zeros codeword. 

In practical applications, the SPA and the MSA are performed for a certain pre-defined 
number of iterations. The binary vector that is obtained at the end of these iterations is 
then considered to be the decision on the transmitted codeword. Very often, the following 
termination rule is used additionally: the algorithm terminates if the binary vector found by 
the algorithm is a codeword, i.e. the syndrome is the all-zeros vector. 

However, for our investigations of the code C we did not adopt this latter termination 
rule: the reason is that there are only eight binary vectors of length 3 and therefore it is not 
unlikely that at some point the algorithm obtains the all-zeros vector even if the internal state 
of the iterative process has not converged to a stable point.® So, for obtaining the plots in 
Fig. 2 we did the following: for each (Ai, A 2 , A 3 ) point we performed 60 iterations of the SPA 
and we considered the algorithm to have converged once the decision vector remained the 
all-zeros codeword over subsequent iterations. Fig. 2 shows then the decision regions and the 
convergence times under SPA decoding after performing 60 iterations. It is evident that these 
decision regions are clearly different from the decision regions for block-wise MAPD/MLD! 
Indeed, the plots in Fig. 2 suggest that there is a decision boundary described by the equation 
Ai -|- A 2 -|- A 3 = 0: for Ai -|- A 2 -|- A 3 > 0 the SPA does converge and for Ai -|- A 2 -|- A 3 < 0 the 
SPA does not converge to the all-zeros codeword. 

How can these differences in the decision regions between MAPD/MLD on the one hand 

^Note that using a symbol-wise maximum a-posteriori decoder has also only one decision region: we decide 
for xi = 0, *2 = 0, *3 = 0 independently of A. 

®For reasonably long codes this is hardly an issue. E.g. for a rate-1/2 code of length 200, the probability 
that the algorithm accidentally finds a codeword is = 2“^°°. 
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Figure 3: Left: a possible triple cover T of the Tanner graph T. Right: a non-zero codeword 
of the code defined by T. 




Figure 4: Left: computation tree with root X 2 after two iterations when decoding code C. 
Right: computation tree with root X 2 ^i after two iterations when decoding code C. 


and MPID on the other hand be explained? In this paper we argue that the key difference 
between the block-wise MAPD/MLD (or symbol-wise MAPD/MLD) and any MPID algo¬ 
rithm is the following: whereas the former algorithms use global information and constraints 
to find the optimal solution, the latter algorithms base their decisions on information that 
was gathered by processing information locally. This locality, which on one hand leads to 
huge savings in terms of the number of computations needed, is on the other hand also the 
main weakness of any MPID algorithm. 

Let us briefly outline how we will use this global-vs-local perspective to obtain an unter- 
standing of the differences between MAPD/MLD and MPID. Consider the code C of length 9 
that is defined by the Tanner graph T in Fig. 3 (left). Assume that we use this code for data 
transmission over an AWGNC and assume that at the receiver the hypothetical LLR vector 
is 


^ — (Aiu:Ai^2:Ai_3, A2,i:A2,2:A2,3, A34:A3_2:A3_3). 

In the same way that we used the SPA for decoding the code C whose Tanner graph is shown 
in Fig. 1, we can use the analogous message-passing-based decoding algorithm for decoding 
the code C. 

For both cases we can draw the computation trees [12]: Fig. 4 (left) shows the computation 
tree with root X 2 after two iterations when decoding code C whereas Fig. 4 (right) shows the 
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computation tree with root X 2 ^i after two iterations when decoding code C. The topological 
equivalence with the computation tree in Fig. 4 (left) might at first appear as a coincidence. 
However, this is not a coincidence. The reason is that the Tanner graph T has a special 
relationship with respect to the Tanner graph T; in fact, T is a so-called 3-cover of T. This 
means that T has three times more nodes but locally it is indistinguishable from T. 

Moreover, if we assume that 

A = (Ai:Ai:Ai, A2:A2:A2, AaiAsiAa) 

then not only are the computation trees topologically equivalent, but also the messages are 
identical! Therefore, for this special choice of A (in relation to a given A), the message¬ 
passing-based decoding algorithm cannot distinguish if it is decoding code C or C. In fact, it 
cannot distinguish if it is decoding code C or any code defined by any graph cover of T. The 
harmful effect of the codes that are given by the graph covers is that they contain codewords 
that cannot be explained as liftings of codewords in C. E.g. code C contains the codeword 
(0:0:0, 0:0:0, 0:0:0) which is a lifting of the codeword (0,0,0) in C. However, code C contains 
also the codeword (1:1:0, 1:1:0, 1:1:0), cf. Fig. 3 (right), which is not a lifting of a codeword 
in C.® 

We emphasize two crucial observations: 

• In principle, locally operating decoding algorithms cannot distinguish if they are oper¬ 
ating on a Tanner graph T or any finite cover of this graph as, for example, the cubic 
cover depicted in Fig. 3 (left). 

• In general, the binary codes defined by finite covers of a Tanner graph support codewords 
that are not liftings of codewords in the original Tanner graph. Snch a codeword is 
indicated in Fig. 3 (right) for the cubic cover in Fig. 3 (left). 

It is clear, that any locally operating MPID will antomatically take into account all pos¬ 
sible codewords in all finite graph covers of the original graph. In other words, whereas in 
MAPD/MLD decoding all the codewords are competing to be the best explanation of the 
received vector, under MPID all codewords in all finite graph covers compete to be the best 
explanation of the received vector. In the case of our example code, the existence of non-zero 
codewords in finite covers of the original graph explains to large extents the observed behavior 
of SPA- and MSA-based decoding: indeed, for the specific code at hand it can be shown that 
any non-zero codeword in a finite cover of T (like the codeword in the triple cover shown in 
Fig. 3 (right)) has the same effect as a virtually present, all-one codeword. 

At first glance it seems to be a formidable task to characterize all possible codewords 
being introduced by the union of finite covers of any degree. (The number of finite covers 
of a graph grows faster than exponential with the covering degree). However, it turns out 
that this becomes an object that itself is elegantly described and compactly represented in 
the original Tanner graph. 

Let us emphasize that this paper uses graph covers as an analysis technique. In the 
past, there have been various researchers who have used graph covers (sometimes also called 
graph liftings) but they used them for constructing LDPC codes that have some desirable 
symmetries, see e.g. Tanner et al. [35, 36]. 

®In total, C contains four codewords, three of them are not liftings of any codeword of C. 
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Before concluding this motivating example let us mention some unexplained behaviour of 
SPA decoding for larger LLR values, see Fig. 5. Besides the decision boundary A 1 + A 2 + A 3 = 0 
that we have already discussed above, there appears an oval-shaped region where the SPA 
seems to have a problem in converging to the all-zeros codeword. Upon applying a slight 
modification to the SPA decoder, these oval-shaped regions disappear however, see Fig. 6 . The 
modification that we applied was the following. Letting and be the LLR messages 
at iteration t from the bit nodes to the check nodes and from check to bit nodes, respectively, 
the usual SPA message updates can be written as //(*) := := /(/x^) for 

some suitably chosen functions / and /. The modified SPA message update rules are then 
/ih) := a ■ -|- (1 — a) • := for some a where 0 ^ a ^ 1.’^ Note 

that this modified SPA still operates locally and so it cannot distinguish if it is decoding the 
code described by the base Tanner graph or any of the codes described by the finite covers of 
the base Tanner graph. 

1.2 Notation 

This section discusses the various notations that we will use in this paper. We start with 
some sets. We let Q, (Q+, (Q++, R, IR+, and IR++ be the set of integers, the set of 

non-negative integers, the set of positive integers, the set of quotients, the set of non-negative 
quotients, the set of positive quotients, the set of real numbers, the set of non-negative real 
numbers, and the set of positive real numbers, respectively. We let F 2 — {0,1} be the Galois 
held with two elements; as a set, F 2 will be considered as a subset of IR. The size of a set S 
is denoted by |5|. 

In the following, all scalars, entries of vectors, and entries of matrices will be considered to 
be in R, unless noted otherwise. So, if an addition or a multiplication is not in the real held, 
we will indicate this, e.g. by writing a + b (in F 2 ) or a -|- b (in F 2 ). Moreover, when T £ 
and S C F^ then an expression like T C 5 (in F 2 ) means that t (mod 2) lies in S for all 
t £ T. As usually done in coding theory, we use only row vectors. An inequality of the form 
a ^ b involving two vectors of length N is to be understood component-wise, i.e. ai ^ hi for 
all 1 ^ ^ A^. We let Ijv be the row-vector of length N and the matrix be the identity 

matrix of size N x N; when the length (size) of this vector (matrix) are obvious from the 
context, we will omit the index. The support supp(x) of a vector will be the set of indices 
where x is nonzero. 

Square brackets will be used in different ways: if L is some positive integer then [L] will 
denote the set {1, 2,... , L}. If A is some matrix then [A]^ ^ will denote the element in the 
fc-th row and Ath column of A. If 5 is a statement (for example x £ C) then [S] = 1 if S' is 
true and [S] = 0 otherwise. 

By (x, y) = Yl- xiUi we will denote the standard inner product of two vectors having the 
same length. The £i-norm of a vector x is ||x||^ = the ^ 2 -aorm of a vector x is ||x ||2 — 

|xjP, and the £oo-aorm (also called the max-norm) of a vector x is ||x||j^ = max, \xi\. 
Note that ||x||j^ = (x, 1) if and only if x ^ 0. Let x, y £ F^ be two vectors of length N. 
The Hamming weight rcH(x) of x is the number of non-zero positions of x, and the Hamming 
distance dH(x, y) between x and y is the number of positions where x and y disagree. 

’^Let us mention that while disussing trapping sets and their influence, Laendner and Milenkovic [37] 
observed a similar slight change in behavior upon modifying the SPA slightly. However, whereas they are 
“averaging” the probability messages, we are “averaging” the LLR messages. 





Figure 5: SPA decoding with maximally 60 iterations of the code C that is represented by the 
Tanner graph T in Fig. 1. The gray-scale indicates after how many iterations the algorithm 
converged to the all-zeros codeword with the implication that in the black region the decoder 
did not converge (see text for more details). From top-left to bottom-right: (Ai, A 3 )-plane for 
A 2 = -10, -5, -2.5, 0, -F2.5, -F5, -hlO. 
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Figure 6; Modified SPA decoding (a = 0.85) with maximally 60 iterations of the code C that 
is represented by the Tanner graph T in Fig. 1. The gray-scale indicates after how many 
iterations the algorithm converged to the all-zeros codeword with the implication that in 
the black region the decoder did not converge (see text for more details). From top-left to 
bottom-right: (Ai, A 3 )-plane for A 2 = —10, —5, —2.5, 0, -1-2.5, -|-5, -|-10. 
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Unless stated otherwise, the code C will be a binary linear code of length n and will 

be dehned by some m x n parity-check matrix H, i.e. C = {x S F 2 | xH'''= 0}.® We let 

I = X(H) = {1,... ,n} be the set of codeword indices, = J"(H) = ,m} be the set of 

check indices, J'i = Ji(H) = {j ^ J \ [H]j^j=l} be the set of check indices that involve the 
i-ih. codeword position, and Ij = Xj(H) = {i G T | [H]j_j=l} be the set of codeword positions 
that are involved in the j-ih. check. 

If X G F 2 5 C X, we let x^ be the sub-vector of those positions of x whose indices 
are elements of S, i.e. the projection of x onto S. Similarly, Cs — {x^ | x G C} will be the 
projection of C onto the index set S.^ A (rCcoi, 'ii’row)-regular binary LDPC code is a code that 
has a parity-check matrix H where all columns have weight tCcoi and all rows have weight 
rcrow The dimension of a code C is the logarithm (to the base 2) of the number of codeword 
and the rate is the ratio of the dimension divided by the length. Note that the dimension of 
C is at least 1 — \J'\/n, with equality if and only if H has full rank. 

If C is a code then the minimum Hamming weight is the minimum Hamming 

weight of all nonzero codewords of C, and the minimum Hamming distance is the 

minimum Hamming distance between any two distinct codewords of C. It is well known that 
for linear codes A code of length n, dimension k, and minimum distance 

d will be called an [n, k, d] code. 

Let us introduce some notions from convex geometry (see e.g. [39]). Let xT),... be 
k points in IR^. A point of the form 0ixT) + ... + with 9i + • ■ ■ + 6^ = '\- and 6i ^ 0, 

i G [A:] is called a convex combination of x^^^, ■ ■ ■, x^^^. A set S C is called convex if every 
possible convex combination of two points of S is in S. By conv(5) we denote the convex hull 
of the set S, i.e. the set that consists of all possible convex combinations of all the points in 
S] equivalently, conv(5) is the smallest convex set that contains S. 

Again, let x^),..., be k points in IR^. A point of the form 0ixT) + ... + with 

9i^ Q, i € [k], is called a conic combination of xT)^... A set /C C is called a cone 

if every possible conic combination of two points of /C is in 1C. A cone 1C is called a proper 
cone if it satisfies the following conditions: 1C is convex, 1C is closed, 1C is solid (i.e. it has 
nonempty interior), and 1C is pointed (i.e., it contains no line or, equivalently, if x G /C and 
—X G K, then x = 0). By conic(5) we denote the conic hull of the set S, i.e. the set that 
consists of all possible conic combinations of all the points in S; equivalently, conic (5) is the 
smallest conic set that contains S. 

Let us now introduce polytopes and polyhedra. On the one hand, a polytope in is 
defined to be the convex hull of a finite set of points in IR^. On the other hand, a polyhedron 
V in IR^ is defined as the solution set of a finite number of linear equalities and inequalities: 


-p A 


X G F 


N 


(a(^),x) ^ bj,j G [m], = dj,j G 


L) 


bl} 


where j G [m], and j G [p], are vectors of the same length as x and bj, j G [m], and 
dj, j G \p], are scalars. From this definition we see that a polyhedron is the intersection of a 
finite number of half-spaces and hyperplanes and it is also easy to see that a polyhedron is a 
convex set. By the Weyl-Minkowski Theorem, cf. e.g. [40, p. 55], a bounded polyhedron is a 
polytope. 

®Note the following convention: a row index of H will be denoted by j and a column index of H will be 
denoted by i. 

®In coding language, this is often called puncturing the code C at positions X\S [38]. 
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Figure 7; Left: base graph G. Right: sample of possible 2-covers of G. 




Figure 8: Left: a possible 3-cover of G. Right: a possible M-cover of G. 

An undirected graph G = G(V(G),<?(G)) consists of a vertex-set V(G) and an edge-set 
<5(G) whereby the elements of <5(G) are 2-subsets of V(G). By a graph (without further 
qualifications) we will always mean an undirected graph without loops and multiple edges. 
The smallest length of any cycle will be called the girth g{G) and the largest graph distance 
between any to vertices will be called the diameter 6{G). If the graph has more than one 
component then (5(G) = oo. The neighborhood d{v) of a vertex u £ G is the set of vertices of 
G that are adjacent to v. It follows that |5(u)| is the degree of the vertex v. 


2 Graph Covers and the Fundamental Polytope 

After recalling the definitions of finite graph covers and Tanner graphs, we will introduce the 
fundamental polytope, a notion that will turn out to be the crucial definition for the rest of 
the present paper. 

Definition 1 (Graph cover, see e.g. [41, 42]) An unramified, finite cover, or, simply, a 
cover of a (base) graph G is a graph G along with a surjective map (/> : G — > G which is a 
graph homomorphism, i.e., which takes adjacent vertices of G to adjacent vertices ofG, such 
that for each vertex v G V(G) and each v G cj)~^{v), the neighborhood d(v) of v is mapped 
bijectively to d{v). For a positive integer M, an M-cover of G is an unramified finite cover 
^ : G — > G such that for each vertex v G V(G) of G, (j)~^{v) contains exactly M vertices of 
G. An M-cover of G is sometimes also called an M-sheeted covering of G or a cover of G of 
degree M □ 

A consequence of this definition is that if G is an M-cover of G then we can choose V(G) to 
be V(G) = V(G) X [Mj: if (u, m) G V(G) then m)) = v and if ((ui, mi), (u 2 , m 2 )) G T(G) 
then ())({(ui,mi), (u 2 ,m 2 )}) = {vi,V 2 ]- Another consequence is that any M 2 -cover of any 
Mi-cover of the base graph is an (Mi • M 2 )-cover of the base graph. 

^°It is important not to confnse the degree of a covering and the degree of a vertex. 
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Figure 9: Left: Tanner graph T(H) of the simple binary linear code in Ex. 4. Middle: Possible 
3-cover of T(H). The shading of the symbol nodes indicates the codeword x found in Ex. 5. 
Right: Possible M-cover of T(H). 


Example 2 Let G be a (base) graph with 4 vertices and 5 edges as shown in Fig. 7 (left). 
Figs. 7 (right) and 8 (left), show possible 2 - and 3-covers of G, respectively. Any M-cover of 
G is entirely specified by |T(G)| permutations: this is represented by Fig. 8 (right). Note that 
any 2 -cover of G must have 8 = 2-4 vertices and 10 = 2 • 5 edges and any 3-cover of G must 
have 12 = 3 • 4 vertices and 15 = 3 • 5 edges. □ 

In general, a graph G has (M!)l^(^^l possible M-covers, some of them might be isomorphic. 
Moreover, an M-cover of G may consist of several components also if G consists of only one 
component. Before we can consider graph covers of Tanner graphs, we briefly recall the 
definition of Tanner graphs. 

Definition 3 (Tanner graph [5, 11, 14]) To a binary parity-check matrix H that defines 
the code C we can associate a bipartite graph T(H), the so-called Tanner graph o/H. This 
graph has vertex set V = {Xi | i G T} U {Bj | j G J"} and edge set E = {{Xi,Bj) \ i G 
I,j G Ji\ = {{Xi,Bj) \ j € J,i € Tj}. On the other hand, given a Tanner graph T we can 
associate to T a code C(T) with parity-check matrix H(T) in the obvious manner. □ 

We will use some language from behavioral theory [43]: an assignment of F 2 -values to the 
variable nodes will be called a configuration, and a configuration that fulfills all the checks 
will be called valid. In that sense, a codeword corresponds to a valid configuration and a code 
corresponds to the set of all valid configurations. 

From the above definition of a Tanner graph it follows that d{Xi) = {Bj \ j G Ji} for all 
i G T(H) and d{Bj) = {Xi \ i G Tj} for all j G 77(H). Moreover, the degree |i9(Aj)| of the 
node Xi is equal to the Hamming weight of the z-th column of H and the degree \d{Bj)\ of the 
node Bj is equal to the Hamming weight of the j-th. row of H. Therefore, Tanner graphs of 
LDPC codes are sparse because of the sparseness of the parity-check matrix of LDPC codes. 
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Example 4 Let C be a binary [4, 2] code with parity-check matrix^^ 


H4 




Obviously, C = {(0,0,0,0), (0,1,1,0), (1,0,1,1), (1,1,0,1)}, J = { 1 , 2 }, = {!}, J 2 = 

{1,2}, J 3 = {1, 2}, J 4 = {2}, J = {1, 2, 3,4}, Ji = {1, 2,3}, and J 2 = {2,3,4}. The Tanner 
graph T(H) that is associated to H is shown in Fig. 9 (left). 

An M-fold cover T (as shown in Fig. 9 (right)) of T is specified by defining the permu¬ 
tations TTi^i, 7 ri^ 2 ) (corresponding to the first row of H) and the permutations 7 r 2 , 2 , 7 r 2 , 3 , 
7 r 2,4 (corresponding to the second row of H). □ 

Let C be a binary code with parity-check matrix H and Tanner graph T = T(H). For a 
positive integer M, let T be an arbitrary M-fold cover of T, let C = C(T) be the binary code 
described by T, and let the codeword positions of C be indexed by T = X x [M] and the check 
equations hy J = J x [M], ^ 

Knowing the graph T, the graph T is completely specified by defining for all j £ J^ 
i £ Xj the permutations that map [M] onto itself. The meaning of 7 rj_j(m), m £ [M], is 
the following: the copy of check node j is connected to the 7 rj^j(m)*'^ copy of codeword 
symbol Xi, i.e. check node is connected to codeword symbol It follows that 

X £ C if and only if 

^ ^ 0 (icL F 2 ) ( 2 ) 

i&lj 


for all (j, m) £ J'. The parity check matrix H that expresses this fact can be defined as 
follows. Let the entries of H be indexed by {j,m) £ J and £ X. Then 




if z £ Xj and m' = 7 rjj{m) 
otherwise. 


(3) 


Example 5 We continue Ex. 4. The parity-check matrix H = H(T) associated to a possible 
3-fold cover Tanner graph T as shown in Fig. 9 (middle) looks like 


0 

1 

0 

1 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

1 

0 

1 

0 

0 

0 

1 
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1 

0 

0 

0 

0 

1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

1 

1 

0 

0 

0 

0 

0 

1 

0 

0 

1 

0 

0 

0 

1 

0 

0 

0 

0 

0 

1 

0 

0 

1 

0 

0 

0 

1 


This parity-check matrix defines a code C = C(T): e.g. the configuration x = (1:1:0, 0:1:1, 
0:1:1, 0:0:0) that is highlighted in Fig. 9 (middle) is a codeword in this code. Note also that C 
contains the liftings of all codewords to T, namely if {xi, X 2 , X 3 , x^) £ C then {xi'.xi'.xi, X 2 -X 2 -X 2 , 
X 3 :x^X 3 , Xi'.Xi'.XA) £ C. The last statement follows from the following argument: since T 
and T look locally the same, the fact that a codeword x in C fulfills the checks imposed by T 
implies that the lifting of x to T fulfills all the checks imposed by T, i.e. that it is a codeword 
in C. □ 

^^Note that this is the same parity-check matrix as in the Example after Th. 2 in [44]. 
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Definition 6 Let C be a binary linear (base) code with parity-check matrix H and let T = 
T(H) be the corresponding Tanner graph. For any positive integer M, let T be an M-fold 
cover of T and let C = C(T). The (scaled) pseudo-codeword associated to x is the rational 
vector IjO(S i) = (wi(x),a; 2 (x),...,a;„(x)) with 


Wi(x) = 


M ^ 

me[M] 


( 4 ) 


where the sum is taken in IR (not in ¥ 2 )- In fact, any multiple (by a positive scalar) of u{x) 
will be called a pseudo-codeword associated with x. Because of its importance, we give a special 
name to the vector M • namely we will call it the unsealed pseudo-codeword associated 

to X. Additionally, we define u;{C) to be the set 

oj(C) = |t<^(x) 

Obviously, a;(C) C [ 0 , 1 ]"" n Q”. □ 

Note that whereas a pseudo-codeword as defined in Def. 6 has length |T(H)|, i.e. equal to 
the length of the code C, a codeword like c G C has length M • |X(H)| where M is the degree 
of the corresponding cover Tanner graph. Because T is a 1-cover of a Tanner graph T we see 
that any codeword is also a pseudo-codeword. 


X G CI . 


Example 7 We continue Ex. 4. We saw that c = (1:1:0, 0:1:1, 0:1:1, 0:0:0) was a codeword 
of the code C. Applying Def. 6 we see that the corresponding pseudo-codeword is a;(c) = 
(|, |,0). (Note that this pseudo-codeword cannot be written as a convex combination of 

the codewords in C.) The corresponding unsealed pseudo-codeword is 3-a;(c) = (2, 2, 2, 0) and 
comparing this vector with Fig. 9 (middle), we see the intuitive meaning of its components: 
the first component corresponds to the number of shaded variable nodes na G [M], the 

second component corresponds to the number of shaded variable nodes X 2 ^m, nn G [M], etc. 
□ 


We would like to investigate the question if it is possible to characterize the union of the 
set of all (scaled) pseudo-codewords obtained by all finite covers of the Tanner graph of a 
binary linear code, i.e. we would like to understand the set 

Q(H)^ U 

MeZ + _|_ 

T: T is an M-fold cover of T(H) 

SeC(t) 


and its “projection” 


Q(H) A IJ {c^(X)}. (6) 

(M,T,x)eQ(H) 

could have defined 

Q(H)4 y {(M,f,a;( 5 ))}, 

(M,f,S)eS(H) 

but the definition of Q(H) in (6) contains enough information for our purposes. 
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From the properties of uj{C) it follows that Q(H) C [0,1]"' n Q”. Observe that 

Q(H)= U {^(C(T))}. (7) 

T: T is a finite-cover graph of T(H) 

This set has a surprisingly simple characterization. It will tnrn out that Q(H) is essentially 
given by the fundamental polytope introduced in the next dehnition. Before we turn to that 
dehnition, let us observe that the code C can be written as the intersection 

c=n cj 

j&j 

of the codes 

Cj ^ C,(H)4 {x E I (x, h,) = 0 (in Fa)} = {x E F^ | (xj,, 1) = 0 (in Fa)}, (8) 

where for each j E 77 we let h.j be the j-th row of H. For j E 77, we will also use the codes 

C' ^ C'(H)4 |x' e Ff I I (x', (h,)x^.) = 0 (in Fa)} = {x' E F^' | (x', l) = 0 (in Fa)} . 

(9) 

The codes Cj and C' are related as follows. First, C' is the projection of Cj onto Ij, i.e. C' = 
{Cj)jy Secondly, the convex hulls of Cj and of C} fnlfill 

conv (Cj) = {a; E IR” I 0 ^ ^ 1, E conv(C})}. (10) 

We are now ready for the main definition of this paper. 

Definition 8 The fundamental polytope V = ViTl) o/H is defined to be the set 

P = Pi conv(Cj) ( 11 ) 

= P p E F" I 0 ^ CJ ^ 1, LJXj E conv(C')} (12) 

= [0,1]"' n P {a; E F” I uj. E conv(C')}. (13) 


As can be seen from the notation T’(H), the fundamental polytope is a function of the 
parity-check matrix H that describes the code C. This means that different parity-check 
matrices for the same code can (and nsually do) yield different fundamental polytopes. 

In the same way as all codewords of a code described by a parity-check matrix H are all 
the valid configurations in a Tanner graph T(H), we see that (13) yields a similar description 
for all pseudo-codewords, i.e. for all the vectors that lie in the fnndamental polytope T’(H). 
Indeed, we redehne the Tanner graph as follows: each bit node Xi is now labeled flj and 
can take on values in the interval [ 0 , 1 ] and each check node Bj is replaced by the indicator 
fnnction of the convex hull of Cj. (We can use the results of Lemmas 25 and 26 in Sec. 5 to 
formulate these indicator functions.) 
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Figure 10: Cj(H), j ^ J and P(H) for the parity-check matrix H in (1). Top left: 
conv(Ci(H)). Top right: conv(C 2 (H)). Bottom left: conv(C 3 (H)). Bottom right: T’(H) = 
Hjey conv(Cj(H)). 


Example 9 We continue discussing the code that was introduced in Sec. 1.1 whose parity- 
check matrix is shown in (1). For this parity-check matrix the codes Cj, j € turn out to 
be 

Ci={(0,0),(l,l)} X {(0),(1)} = {(0,0,0), (0,0,1), (1,1,0), (1,1,1)}, 

C2 ={(0,0,0),(0,1,1),(1,0,1),(1,1,0)}, 

C3={(0),(1)} X {(0,0),(1,1)} ={(0,0,0), (1,0,0), (0,1,1), (1,1,1)}. 

We can easily check that C = Cij^jCj = {(0,0,0)}. Fig. 10 visualizes these codes, their convex 
hulls, and the fundamental polytope 'P(H) = n^-gy conv(Cj) = | 0 ^ a; ^ |}. Note 

that here the fundamental polytope has only two vertices: (0,0,0) and (|,|,|) where the 
former is the pseudo-codeword corresponding to the all-zeros assignment in any finite cover 
and where the latter is e.g. the pseudo-codeword corresponding to the configuration in the 
triple cover shown Fig. 3 (right). 

Moreover, using Prop. 10 below, it can be shown that Q(H) equals the set of all the rational 
points of 'P(H). Accepting this fact, we can also verify the statement made in Prop. 10 that 
all vertices of 'P(H) are in Q(H). □ 

Note that usually the effective dimension of the fundamental polytope equals the length 
n of the code. In cases where the parity-check matrix has checks that involve only one or two 
codeword symbols, there is a reduction in effective dimensionality. The above example is a 
witness of this fact. 

After having seen the definition of the fundamental polytope we are in a position to 
formulate the main theorem of this paper which relates the set Q(H) with the fundamental 
polytope T’(H). 
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Proposition 10 Let C be an arbitrary binary linear code and let H be its parity-check matrix. 
It holds that 


Q(H) = P(H) n (14) 

P(H)=^, (15) 

where the over-bar denotes the closure of the corresponding set under the usual topology of 

IR"’. Moreover, all vertices o/P(H) are in Q(H). 

Proof: See Sec. A.l. □ 

Before finishing this section let us mention that the fundamental polytope and related 
concepts can not only be defined for a code whose Tanner graph consists only of single parity- 
check codes but also for codes described by a Tanner graph where some or all of the check 
nodes represent more complicated subcodes or for codes described by a factor graph that 
represents a tail-biting trellis. The generalization is relatively straightforward and will not be 
discussed any further in this paper. 


3 Channels, MAP Decoding, and LP Decoding 


We consider the problem of data communication over a memoryless channel with input alpha¬ 
bet X, output alphabet y, and with channel law PY\x{y\x). In this paper we only consider 
channels with binary input, i.e. with X = {0,1}. In order to achieve reliable communication 
over such a channel, we will use a binary code C C of length n and rate R that is defined 
by some parity-check matrix H. We assume that every codeword x S C is transmitted with 
equal probability, i.e. Px(x) = if x G C and Px(x) = 0 otherwise, where R is the rate 

of the code. 

Upon observing the output Y = y, block-wise maximum a-posteriori decoding (MAPD) 
can be formulated as the following optimization problemd^ 

xMAPD(y) = argmax Px,Y(x,y) = argmax Px(x) • TY|x(y|x) 
xSFj xeFj 

= argmax TY|x(y|x) = argmin - logPY|x(y|x), (16) 

■V£Z/’ ' ■V£Z/' ' 


where Tx,y(x, y) = Tx(x) • TY|x(y|x) is the joint pmf/pdf of the the coded (but un¬ 
modulated) channel input X and the channel output Y. Ties are resolved in a systematic 
way. 

In the following we will use the fact that TY|x(y|x) = OieJ PvtlXiimlxi) = Oiez ^riximlxi) 
holds for memoryless channels (that are used without feedback). The random variable 


A, ^ A,(Y0 4 log 


PY\x{Yi\0) 

Py\x{Y,\1) 


(17) 


with realization A will be be called the channel log-likelihood ratio for the i-th codeword 

^^Note that the resulting decision rule equals also the maximum-likelihood decision rule because all possible 
codewords x occur with the same probability. 
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symbol.Block-wise MAPD can therefore be rewritten to read 

^MAPD/y\ _ argminlog } = argminy^XjAj = argmin(x, A), (18) 

^ec i\|x(y|x) xec ^ xec 

where ties are resolved in a systematic manner. 

From this expression it is not far anymore to linear programming decoding (LPD) [31, 32]. 
In a first step, let us reformulate (18) as 

x^^™(y)=arg min (x,A), (19) 

xEconv(C) 

where ties are resolved in a systematic manner. This expression follows from two facts: 
all codewords in C are vertices of conv(C) and because the cost function is linear, the set 
of optimal solutions must always include at least one vertex of conv(C).^® The resulting 
optimization problem on the right-hand side of (19) is a linear program (LP). Although it is 
of course desirable to solve such a problem, for arbitrary codes this problem turns out to be 
hard, a reason being that the number of inequalities needed to describe conv(C) usually grows 
exponentially in the block length. A standard way in optimization theory to circumvent such 
complexity issues is to solve a closely related problem: instead of minimizing over conv(C) 
we will minimize over a relaxation polytope relax(conv(C)) of this polytope, i.e. over a larger 
polytope: 

‘^^^°(y) = arg min (u,A), (20) 

a;Grelax(conv (C)) 

Of course, this new polytope should have a low description complexity, yet be a good approx¬ 
imation of conv(C) so that it is highly likely that x'^^^(y) = x^^^^(y). In particular, all 
codewords in C should be vertices of relax(conv(C)). 

Probably one of the easiest ways of obtaining a reasonable relaxation is the following. 
Observe that 

C= f| Cj(H), 

JeJ(H) 

where Cj(H) was defined in (8). Consider now the set 

^(H)= n conv (C,(H)). (21) 

JeJ(H) 

The fact that the set 7^(H) is a relaxation of conv(C) can be seen from the following chain 
of reasoning: firstly, the set 7^(11) is the intersection of convex sets and is therefore convex 
itself; secondly, the set 7Z(H) contains all codewords in C; thirdly, conv(C) is the smallest 
convex set that contains C; combining these three observations leads to the conclusion that 
conv(C) C 71(H). Note that conv(C) = 7?.(H) is possible though strict inclusion turns out 
to be what happens usually. Of course, the set 7^(11) in (21) equals the set 'P(H) defined in 
Def. 8: the solution of the LP decoder when choosing relax(conv(C)) = 7Z(H) = 'P(H) will 
henceforth be called a)^^^^^^(y). 

The next definition introduces another class of relaxations. 

Because of the memoryless property of the channel it also follows that pa|x(-^|x) = = 

UipxPAIxi^ilXi). 

case a whole face of of conv(C) is optimal we decide in favor of one of the vertices in it. 
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Definition 11 Let H be an arbitrary parity-check matrix that defines a code C. For some 
r 1, let 


^r(H) ^ f| conv (C(h)), 

h 

where the intersection is over all vectors h G F 2 written as the modulo-2 sum of at 

most r rows o/H. We call TZrfH.) the r-th relaxation o/conv(C) with respect to H. Note that 
7^r(H) = 'P(H') where H' is the parity-check matrix consisting of all rows o/H, the modulo-2 
sums of all pairs of rows o/H, ..., the modulo-2 sum of all r-tuples of rows of H. □ 

Some of the consequences of this definition will be explored in Sec. 8.3. 

Let us define three channels that will be of prime interest in this paper: the binary- 
input additive white Gaussian noise channel (BI-AWGNC or simply AWGNG), the binary 
symmetric channel (BSG), and the binary erasure channel (BEG). 


Example 12 The binary input additive white Gaussian noise channel (BI-AWGNG) with 
input energy per channel symbol and noise power has output alphabet T = F and 
channel law^® 




/ 

1 

exp f 

{y-i/E:f\ 

y/2n(j 

2cr^ ) 

1 1 

exp 

[y+YELfi \ 

^ y/2n(j 

— 2 ^ 2 —j 


(if X 
(if X 


0 ) 

1 ) 


( 22 ) 


Defining the input energy per information symbol to be this quantity is related to 
through Ec = R-E^,. Introducing Nq = 2cj^, two different signal-to-noise ratios can be defined, 
namely SNRb = E^/Nq and SNRc = Ec/Nq, which are related through SNRc = R ■ SNRb. 
Defining x{x) = • (1 — 2x) for a; G F 2 C IR we can write (22) as 


PY\x{y\x) 



{y-xix)f\ 

2 a 2 j ■ 


If X G F 2 C F"" is the codeword to be transmitted, then the modulated word is x = x(x) = 
^/E^■ (1 — 2x). So, upon sending Xi we receive Yi = Xi + Zi where Zi is normally distributed 
with mean zero and variance ci^. Therefore, Yj given Aj = 0 is normally distributed with 
mean -\-^/^ and variance cr^, whereas Yj given Aj = 1 is normally distributed with mean 
— \/E^ and variance cr^. For the BI-AWGNG we have a simple relationship between Y and 
A, namely by simplifying the definition of LLR for the i-th symbol we see that 


Ai 4 A,(y,) A log 


PYi\Xi(Pi\^) 


= log 


PYi\Xi(Pi\ + ^ VRE b ^ 

Py^lxS^^l - VK) ~ ' ^0 


Yi 


i.e. A is just a scaled version of Y. From this, it can easily be calculated that Aj given Aj = 0 
is normally distributed with mean 4i?-SNRb and variance 8i?-SNRb, whereas Aj given Aj = 1 
is normally distributed with mean —4R • SNRb and variance 8R ■ SNRb. 

^®In the case of the AWGNC we will denote the output symbols by Yi and not by Yi so that all (random) 
variables that can be represented in a signal space have an over-bar. 
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Finally, let us note that block-wise MAPD can not only be written as in (16) and (18) 
but also as 


X 


MAPD/— 


(y) = argmax > Xj(xj)Aj = argmax (x(x), A) 
xsc ^^ xSC 

iex 


(23) 


i.e. decoding can be written as finding the x(x), x G C, with the largest standard inner 
product with A. The decoding rule in (23) is also known as the correlation decoding rule. □ 


Example 13 The binary symmetric channel (BSC) with cross-over probability 0 ^ e ^ ^ 
has output alphabet y = {0,1} and channel law Py\x{v\x) = 1“^ if y = x and FV|x(y|^) = ^ 
otherwise. The log-likelihood ratio for the z-th bit is the random variable 


. A . A 1 PYi\Xi(Xi\0) 

A, = A,(y,) = log 


+ logi^ (ify. = 0) 
_logi^ (ify. = l)- 


(24) 


Note that log ^ 0. Upon sending Xi = 0, AATj) takes on the value +logi—^ with 

c c 

probability 1 — e and the value — log with probability e. □ 


Example 14 The binary erasure channel (BEC) with erasure probability 0 ^ e ^ 1 has 
output alphabet y = {0,1, ?} and channel law PY\x{y\^) = 1 — e if y = x, Py\x{v\x) = e if 
y = 1, and Py|x(y|x) = 0 otherwise. The log-likelihood ratio for the z-th bit is the random 
variable 


. A . A , PYi\Xi{yi\^) 


-1-00 (if Yi = 0) 

< —oo (if Yi = 1) 

.0 (ifTi = ?) 


(25) 


Upon sending Xi = 0, Ai(Yi) takes on the value -|-oo with probability 1 — e and the value 0 
with probability e. □ 


Definition 15 A binary-input memoryless ehannel {X = {0,1}, T, Ty|x) *■5 ealled output- 
symmetrie if there is a involution^'^ a : T —> T and two (possibly overlapping) sets y' and 
y" sueh that: 

. T" = a{y), y = a{y'), y u y = y. 

• For every y' G T' we have Py|x(y'|0) = Py|x(y"|l) and Py|y:(y'|l) = ^V|A:(y"|0) where 

y" = yy')- 


□ 

It is easy to see that the three previously discussed channels are output-symmetric. For 
the AWGNC one can e.g. choose y = IR+ and cr(y') = —y', for the BSC one can e.g. choose 
y' = {0} and aiy') = 1 — y', and for the BEC one can e.g. choose y' = {0,?}, cr(0) = 1, 
cj(l) = 0, and cr(?) = ?. 

In the rest of this paper we will focus on a specific class of codes, channels, and decoders: 
involution is a mapping of order two, i.e. crfcrfy)) = y for all y ^y. 
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Figure 11: The set A = conv used in Ex. 16. When the cost 

vector lies in ICf- then the linear program decides in favor vertex 0 ;^*^. Note that the half¬ 
rays that constitute the boundaries between the decision regions are perpendicular to the 
corresponding edge of the polytope. (In n-dimensional space the half-rays that span a decision 
cone are perpendicular to the corresponding facets of the polytope.) 


• The codes are assumed to be binary and linear. (Note that a binary code that is defined 
by a parity-check matrix is automatically binary and linear.) 

• The channels are assumed to be binary-input output-symmetric memory less channels. 

• The decoders are symmetric with respect to codewords. 

For this scenario it turns out that the conditional decoding error probability is independent of 
the codeword that was sent. Therefore, for understanding decoders it is sufficient to analyze 
the case where the all-zeros codeword was transmitted. 

The rest of this section will be devoted to recalling some facts from linear programming 
that will help to better understand the LPD. Let n be some positive integer. Consider the 
following optimization problem 


max(u;,c) (26) 

where ^ is a polyhedron in IR” and cost vector c £ IR"'. Such an optimization problem is 
called a linear program (LP) and the set of all cj that achieve the maximum for a give c is 
called the optimum set. Because the polyhedra that we are interested in are bounded we can 
actually assume that ^ is a polytope. 

Example 16 Fig. 11 (left) shows a possible polytope ^ in n = 2 dimensions with vertices 
i G [5]. One way to describe the set A is as the convex combination of the set of 

^®Here are some commonly used terms when talking about polytopes: the intersection of an n-dimensional 
polytope with a tangent hyperplane is called a face, zero-dimensional faces are known as vertices, one¬ 
dimensional faces as edges, (n — 2)-dimensional faces as ridges, and (n — l)-dimensional faces as facets. Note 
that edges and facets of two-dimensional polytopes are both one-dimensional objects; therefore one must be 
careful when generalizing a certain setup to a higher-dimensional space. 
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Figure 12: The set A = conv used in Ex. 19. 



Figure 13: A cone K, in and its dual cone K,^. Because /C is a proper cone it holds that 
= JC. 
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vertices: A = conv . Another way is to describe the set A as the 

intersection of half-spaces where each of the half-spaces is described by a single linear (affine) 
inequality. □ 

A special feature of an LP as in (26) is that for any given c there is always a vertex that 
is optimal.^® Let be a vertex of A. An interesting question to ask is for which vectors 
c the vertex will be in the optimal set. To answer this question it is useful to introduce 
so-called dual cones. 

Definition 17 Let 1C be a cone in IR”. The dual cone K,^ is then defined to be sefi^ 

/C-^ = {cj'E I (cu',cj) ^ 0 Vcj E/C). (27) 

□ 

If /C is a proper cone (cf. Sec. 1.2) it turns out that IC^ is also proper and that = /C. 
Fig. 13 shows a possible cone in two dimensions along with its dual cone. Cones can either 
be described as the conic hull of a set of vectors, as the intersection of half-spaces, or a 
combination of both. When a cone is described as the conic hull of a set of vectors then 
this yields immediately the representation of the dual cone as the intersection of certain half¬ 
spaces. On the other hand, when a cone is described as the intersection of half-spaces then 
this yields immediately the representation of the dual cone as the conic hull of a certain set 
of vectors. 

Example 18 Consider the same setup as in Ex. 16 and fix some i E [5]. It turns out that the 
set of vectors c where is in the optimal set is the set JCf- where JCi = conic [A — co (1)). The 
set ICf- is shown in Fig. 11 (right). It is also instructive to plot the translated set 
in Fig. 11 (left). (Note that when the maximum operator in (26) is replaced by a minimum 
operator then the optimal set is —ICf- where /C* = conic (.A — as above.) □ 

Often it turns out that the linear program in (26) is too complicated to be solved. A 
possibility is then to solve a tightly related problem and then to try to infer the solution of 
the original problem from the related problem. A popular way of obtaining a related problem 
is to relax the set A to the set A! and to solve 

max (uj,c) (28) 

LJ&A' 

Of course, the set A! should have some desirable properties; A! should not be much larger 
than A and all vertices of A should be vertices of A!. 

Example 19 Consider the same setup as in Ex. 16. Instead of solving (26) for the set 
A as in Fig. 11 (left) we can solve the relaxed linear program (28) with the set A! = 
conv({cj'^^\..., as in Fig. 12 (left). We see that A! fulfills the desirable properties 

that were listed above: A' is not much larger than A and z E [5]. Fig. 12 (right) 

shows for which c we decide for which vertex. Of course, the regions fulfill C ICf for 
z E [5]. Moreover, the fact that A' tightly resembles A can also be seen from the fact that 
/C'"*" is nearly as large as ICf for z E [5]. □ 

^®For a generic vector c the set of optimal points will contain exactly one vertex of the polytope. However, 
for any face of the polytope there is at least one cost vector c such that this face is the optimal set. 

^°The dual cone can be defined by {x, y) ^ 0 or by {x,y) ^ 0, here we have chosen the first possibility. 
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Contemplating Figs. 11 and 12, it does not look as if this relaxation really bought us 
anything. In fact, the optimization has to be carried out over a more complex region. How¬ 
ever, for higher-dimensional problems the relaxation approach can work very nicely. E.g. the 
fundamental polytope 'P(H) is a relaxation of the set conv(C) [31, 32] which seems to be 
quite tight especially in the case of LDPC codes. Whereas conv(C) is usually very difficult to 
describe^^, we will see that the fundamental polytope has a relatively simple description. 

We conclude this section with a warning to the uninitiated reader: whereas two-dimensional 
pictures of polytopes and cones are very useful to get an initial understanding of the various 
definitions, higher dimensional polytopes and cones can behave quite differently. Note that 
in the channel coding case the high-dimensional spaces are unavoidable since it is well known 
from information theory that well-performing codes need to have a certain length. 

4 Graph-Cover Decoding 

This section introduces graph-cover decoding (GCD) which is the theoretical tool that will 
help to link LPD and MPID. On the one hand, GCD will be shown to be essentially equivalent 
to LPD. On the other hand, we will discuss how GCD can serve as a model of what is going 
on in MPID. Sometimes it is an exact model but usually it is just a very good approximation. 
The findings in this section will be corroborated by some simulation results that will be 
presented at the end of Sec. 5. 

In the following we assume that we consider data transmission over a channel as discussed 
in Sec. 3. 

Definition 20 (Lifting) Let T be an arbitrary M-cover o/T(H). The M-lifting of a length- 
n vector v is the vector v = with entries Vi^m — Vi for all {i,m) e T(H) x [M], i.e. v is 
a vector of length Mn where each entry is repeated M times. □ 

We remind the reader of the MAPD/MLD decision rule formulation in (16) and (18). That 
rule aims to find the codeword that gives the largest log-likelihood ratio given that y was 
received. GCD extends this idea in the following way: instead of trying to find the codeword 
that gives the largest log-likelihood ratio that y was received we want to find the codeword 
in any finite graph cover that gives the largest log-likelihood ratio that y was received. In 
order to obtain a fair comparison we will rescale the log-likelihood ratios by the order of the 
cover degree. 

However, before formulating GCD more precisely we have to extend the definition of the 
channel law. Let PY\x{y\x) be the channel law of a memoryless channel. We define the 
extended joint conditional pmf/pdf of receiving a vector y of length Mn upon sending a 
vector X of length Mn to be 

^Y|x(yl^) = n n PY\x{yi,m\Xi,m) (29) 

i^[n] mG[M] 

Definition 21 We define graph-cover decoding (GCD) to be the following decision rule: 

(Af,f,X)‘^^°(^)(y) = arg _max_ log (^0) 

(M,T,X)6Q(H) JyI I 

exception are e.g. convolutional codes with not too many states. 
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where ties are resolved in a systematic or arbitrary way. Moreover, let (y) = 

,-GCD(H) . 

^(x (y)). □ 

The 1 /M factor on the right-hand side of (30) is the promised rescaling factor that makes 
a fair comparison of the log-likelihood ratios. Note that the expression in (30) is also well- 
defined in the following sense: let x be a codeword in C. Then, for any M-cover graph T of 
T(H) the vector x = x^^^ is a codeword in C(T) with the property that 

logi\|x(y|x) = ;^logPY|x(y^^|x)' (31) 

(A similar statement can be made about the relationship of a codeword in some finite cover 
to its liftings in finite covers of that finite cover.) 

The next proposition shows that GCD and the LPD are essentially equivalent. 

Proposition 22 For a given received vector y, let a>^^^^^^(y) he the GCD decision as 
defined as in Def. 21 and let cj^^^^^^(y) be the LPD decision of as given in (20) with 
relax(conv(C)) ='P(H). Then 

^GCD(H)(y)^^LPD(H)(y)^ (32) 

(For this statement we assume that if ties appear in either decoder that they are resolved in 
the same way.) 

Proof: See Sec. A.2. □ 

Let us now turn our attention to the connection between GCD and MPID. Recall our 

discussion about MPID for the trivial code in Sec. 1.1. On the one hand, we considered 
MPID of the received vector y on the base Tanner graph T shown in Fig. 1 and on the other 
hand, we considered MPID of y on the triple cover T shown in Fig. 3 (left). Because T and 
T look locally the same, the computation tree for variable node Xi after t iterations will be 
identical to the computation tree for variable node after t iterations, where m G [3] 

is arbitrary. This is shown in Fig. 4 for the variable node X 2 and after t = 2 iterations. 
Moreover, under the assumption that y = yl^ it can readily be verified that the messages on 
the two computation trees are the same. In that way we see that because MPID is operating 
locally on Tanner graphs, MPID cannot distinguish if it is decoding the code defined by the 
base Tanner T graph or any of the codes defined by the finite covers of T. If the decoding of 
these codes is done in a MAPD/MLD fashion, then MPID is essentially equivalent to GCD, 
otherwise GCD is just a (usually very good) approximation to MPID. 

There are cases were GCD is the right model for MPID. The list includes Tanner graphs 
that are trees (i.e. have no cycle), codes represented by trellises, codes represented by tail- 
biting trellises, and cycle codes (i.e. codes where all bit nodes have degree two). Additionally, 
when we transmit over the BEG then GCD is also the right model, independently of the 
Tanner graph of the code. 

In conclusion, we see that the locality, which makes MPID a low-complexity algorithm, is 
also the main weakness of MPID. 

5 Properties of Fundamental Polytopes and Cones 

The fundamental polytope was introduced in Def. 8. In the meantime we have seen that it 
is one of the objects of central interest in this paper, namely it turns up when considering 
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GCD and LPD and because of the closeness of MPID and GCD it seems to be also important 
for MPID. It is therefore natural to try to better understand this object. To that end, this 
section will look at different ways of describing the fundamental polytope and will discuss 
various properties of it. Actually, we will mostly look at the fundamental cone which is 
the fundamental polytope around the vertex 0 and blown up to infinity, in other words, the 
conic hull of the fundamental polytope. Understanding the fundamental cone is sufficient 
because we restrict ourself to using binary-input output-symmetric memoryless channels, as 
was outlined in Sec. 3. 


Definition 23 The fundamental eone /C(H) is defined to be the conic hull of the fundamental 
polytope 'P(H), i.e. 

/C(H) ^ conic(P(H)). 


□ 


From this definition it follows easily that 'P(H) C /C(H) and that for any cj £ A(H) there 
is an cx G [R_|— ^ (in fact, a whole interval of ex s) such that o * G P(H). 

In Ex. 18 we saw that the set of cost vectors where is in the optimal set is given by the 
set (conic(A —In the case of LPD and GGD, we see that 0 is in the optimal set when 

A lies in — (conic('P(H) —which equals —This observation emphasize the fact 
that the fundamental cone contains all the relevant information and it is sufficient to study the 
fundamental cone (instead of the fundamental polytope). For that reason, all vectors in /C(H) 
will be called pseudo-codewords. Moreover, if cj G /C(H) and {a-u\a ^ IR+} is an edge of the 
fundamental cone then we call u a minimal pseudo-codeword. This generalizes the notion of 
minimal codewords [45, 46, 47, 48]^^ which are the edges of conic(C).^'^ Note that although 
all codewords are vertices of the fundamental polytope [31, 32], a minimal codeword need 
not necessarily be a minimal pseudo-codeword! (Given a minimal codeword there are simple 
conditions to check if it is a minimal pseudo-codeword; however, we are not aware of a general 
result that says when a minimal codeword is also a minimal pseudo-codeword. Having e.g. a 
Tanner graph with girth six is neither sufficient nor necessary to have all minimal codewords 
being minimal pseudo-codewords.) 

In Sec. 2 we have seen that Q(H) and 'P(H) are tightly related. Not surprisingly, there is 
a connection between Q(H) and /C(H), a connection that is explored in the following lemma. 

Lemma 24 Remember that if x is a codeword in some M-cover T o/T(H), then Mcj(x) G 
Z” is called the unsealed pseudo-codeword corresponding to x. Let 

Z(H) 4 IJ {Mu{x)} (33) 

(Af,T,S)6Q(H) 

be the set of all these unsealed pseudo-codewords. This set fulfills 2(Ji) = /C(H) n Z” and 
Z(H) = C (in ¥ 2 )- Moreover, for every minimal pseudo-codeword u there is an a ^ IR_|__|_ (in 
fact, a whole set of a’s) such that au G 2(H). 

^^Note that LPD/GCD is formulated as a minimization and not as a maximization problem, therefore the 
minuses in front of the dual cones. 

side remark: interestingly, Decoding Algorithm 1 in [45] can be seen as a simplex-type algorithm on 
conv(C) to solve the LP in (19). 

^'^For a further discussion of minimal pseudo-codewords and minimal pseudo-codeword enumerators, see [49, 
50, 51]. 
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Object 

Number of variables 

Number of (in)equalities 

V in Lemma 25 
JC in Lemma 25 

V in Lemma 26 
JC in Lemma 26 

n -1- 2"'™"'“^ 

nP\J\{^-) 
n 

n 

2n + 1 JKuJrow + + 1) 

n -\-\J'\ {Wraw + (^2°*)) 

2n + 2"''-°"'“^ 

n j- 1 CJ 1 


Table 1: The description complexity of the fundamental polytope and cone for a (rccoi, i^row)- 
regular LDPC code of length n. Here, \ J\ = nwcoi/wrow 


Proof: See Sec. A.3. 


□ 


The following lemmas discuss different representations of the fundamental polytope and 
cone. 


Lemma 25 Let be a 21^^I ^ X Ij matrix containing all the binary vectors of length \Ij\ 
with even Hamming weight, i.e. the codewords of Cl, i.e. the codewords of a single-parity-check 

code of length \Ij\. LetP"^^'^ be a X Xj matrix containing all the binary vectors of length 
\Ij\ with Hamming weight two. The fundamental polytope V = 'P(H) and the fundamental 
cone K, = /C(H) can be described by the following sets of linear inequalities, respectively: 


V 


JC 


cj E 


cj E 


Vi E J: 

0 ^ ^ 1 

V j E J : 


Vi E J: 

0 ^ uji 

V j E J : 



(\Xi\\ 


(34) 

(35) 


Proof: The expression for P is a direct consequence of the dehnition given in (12) and the 
expression for JC is obtained by taking the conic hull of V. Note that because all binary 
vectors of even Hamming weight with Hamming weight larger than two can be written as the 
(integer) sum of several binary vectors of Hamming weight two, we were able to replace the 
matrices by the matrices in the expression for JC. □ 


Lemma 26 The fundamental polytope V = 7^(H) and the fundamental cone JC = /C(H) can 
be described by the following sets of linear inequalities, respectively: 


V = 

JC = 


ca E R" 

a; € R" 


Vi € X : 

Vj € J{Tl), VXj CX^-, |X;| odd-. 


0 ^ w* ^ 1 


(l-o;.)^ |X,|-1 


Vi E X : 

Vj E J(H), Vi' E Ij : 


0 ^ Wi 


Proof: We do not go into the details of deriving these inequalities. For a discussion, see 
e.g. [32, 52]. Note that the inequalities that describe /C(H) are exactly those inequalities 
describing P(H) which are homogenous, i.e. that define half-spaces that go through the origin. 
□ 

Let us consider the description complexities of the various characterizations of the fun¬ 
damental polytope and cone in Lemmas 25 and 26. For reasons of simplicity we consider a 
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(tOcoi) ■u^row)-i'egular binary LDPC code, but similar expressions can be obtained for irregular 
binary LDPC codes. The number of variables and (in)equalities that are needed are listed 
in Tab. 1. For the fundamental polytope we observe a linear behavior in the block length n 
but an exponential behavior in the row weight w^ow For binary LDPC codes, where rcrow 
is a small number this is usually not a problem because 2"'™"' is of reasonable magnitude. 
But for codes where rcrow is on the order of the block length n the description complexity 
obviously grows exponentially in n. Interestingly, as shown in [32, Appendix II], there is 
a way to obtain a description of the fundamental polytope where the number of variables 
and the number of (in)equalities grow only polynomially and not exponentially in tCrow In¬ 
deed, the description complexity for that representation turns out to be on the order of 
0{n\J\ + \ + nWcoiWrow)- While this representation is obviously favorable for tCrow’s 

on the order of n, it is clearly inferior for codes with small w^w- 

Because understanding GCD and LPD is tightly related to understanding the fundamental 
cone, the following lemma lists some reformulations on the (in)equalities that describe the 
fundamental cone. 


Lemma 27 For a vector a; G IR"', ca ^ 0, the following eonditions are equivalent 


• cj E /C(H). 

• For each j £ J we have 

+ ^|Xj| ^ 0’ 

-\- UJ2 + W3 -!-•••— a}']^Xj\ ^ 

where u' = uj.. 

• For eaehj £ J we have ^ O"’". where l|x^|x|x,| ihe all-ones 

matrix of size \Ij\ x \Tj\ and where I|Xj|x|Xj| *5 identity matrix of size \2j\ x \Ij\. 

• For eaehj £ J we have for eachi' £ Ij: oji ^ Wj/, or, equivalently, ^ 

2wj/. 

• For each j £ J we have: Wi ^ 2 ■ (maxjgj^ cjj), which can also he written as 

II^T' 111 ^ 2 • ||cjt. II 

II 11 1 ^ II 11 00 


— U!i + ^2 + 1^3 -!-••• 
— LO2 + W3 + • • • 
LO2 — W3 + • • • 


Lemma 28 Assume that the Tanner graph T(H) of a eode with parity-check matrix H is a 
forest, i.e. it has no eyeles. Then 'P(H) = conv(C), i.e. 'P(H) is the convex hull of all the 
codewords. 


Proof: See Sec. A.4. 


□ 


One of the consequences of Lemma 28 is that GCD and LPD equal MAPD/MLD for codes 
that are described by cycle-free Tanner graphs. Moreover, as is well-known from graphical 
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models, the max-product algorithm is also equal to the MAPD/MLD in the cycle-free Tanner 
case. Unfortunately, as was shown in [53], cycle-free Tanner graphs of binary codes, where 
all constraint nodes are simple parity-checks, support only weak codes. 

Example 29 It is usually difficult to show a picture of the fundamental polytope because it 
is a polytope in IR”' and even small codes have usually a block length n that is larger than 
3. In this example we discuss a code of length n = 7 where all the essential features of 
the fundamental polytope can be shown in a three-dimensional space because the effective 
dimension of the fundamental polytope is three. 

The code C under consideration is the [7, 2, 3] binary linear code with parity-check matrix^® 

/I 1 0 0 0 0 0\ 

1 0 1 0 0 0 0 

0 1110 0 0 

0 0 0 1 1 0 1 ’ 

0 0 0 0 1 1 0 

\0 0 0 0 0 1 1/ 

whose Tanner graph T(H) is shown in Fig. 14 (left). Because all bit nodes have degree two 
this is a so-called cycle code. It can easily be verified that the code C consists of the four 
codewords 

= (0000000), = (1110000), x(3) = (0000111), x^"^) = (1110111). 

Fig. 14 (right) shows a possible double cover. One can check that x = (1:0,1:0,1:0,1:1,1:0,1:0,1:0) 
is an (unsealed) pseudo-codeword with = a;(x) = (^, |, 1, |). Using Lemma 26, 

and applying some simplifications, the fundamental polytope can be expressed as 



r 

0 ^ ^ 1 Vt G [7] 

P(H) = < 

u; e 

LOi = UJ2 = U)3, OJ5 = U)Q = 

a;4 ^ 2 min(u;2,1 - u;2, W 5,1 - <^ 5 ) 


It turns out that this fundamental polytope has five vertices: the four codewords listed above 
and the pseudo-codeword just mentioned. Because toi = u !2 = and = loq = wy, the 
effective dimension of 'P(H) is three and it is sufficient to focus on the three-dimensional 
subspace spanned by (^ 123 ,^ 4 ,^ 567 ) where u;i 23 = cui = (^2 = W 3 and = u!q = loj. 

Fig. 15 (right) shows the fundamental polytope in this space. For comparison purposes. Fig. 15 
(left) shows the four codewords and the convex hull thereof (whose effective dimension is two). 

When drawing the decision regions for MAPD/MLD and LPD it turns out to be sufficient 
to consider the three-dimensional space spanned by (A 123 , A 4 , A 567 ) where A 123 = Ai -|- A 2 + A 3 
and A 567 = A 5 -|- Ae + A 7 . This follows from the fact that (A 123 , A 4 , A 567 ) is a sufficient 
statistic for MAPD/MLD and LPD because X]jg[ 7 ] WjAj = ti;i 23 Ai 23 -|-ti; 4 A 4 + Ci; 567 A 567 for any 
u £ T’(H). For any A 4 the MAPD/MLD the decision regions are shown in Fig. 16 (left). 
It is not surprising that the value of A 4 has no influence on the decision since X 4 is known 
to be equal to zero in all codewords. For LPD the decision regions are shown Fig. 16 (left) 
when A 4 ^ 0 and in Fig. 16 (right) when A 4 < 0. Finally, for MSA and SPA decoding the 
decision regions are shown in Fig. 17 for A 4 = —2. We note that in contrast to MAPD/MLD, 

^®Some of the features of this code were also discussed in [54, 52]. 
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Figure 14: Left: Tanner graph T(H) for the parity-check matrix H in Ex. 29. Right: a 
(possible) double T cover of T(H). 




Figure 15: Left: codewords of C and the polytope conv(C) for the code in Ex. 29. Right: 
fundamental polytope 'P(H). 


MSA and SPA decoding cannot exploit that X 4 equals zero for all valid codewords since no 
locally-operating, message-passing algorithm can come to this conclusion. Because H is the 
parity-check matrix of a cycle code, MSA decoding should behave as predicted by GCD, which 
is indeed the case as shown in Eig. 17 (left). Fig. 17 (right) indicates that GCD gives also 
quite accurate predictions for SPA decoding for the present code. □ 

Example 30 We consider a (3,5)-regular [155, 62] binary LDPC code based on a parity- 
check matrix of size 93 x 155 for data transmission over an AWGNC. The parity-check matrix 
has been randomly generated and four-cycles have been eliminated. Moreover, the matrix 
has full rank and so the code has rate is exactly 2/5. 

The full space of LLR vectors is 155-dimensional. However, for obvious practical problems 
we can only show a two-dimensional slice trough that space. Two interesting slices have 
been picked as follows. We first looked for a low-weight minimal pseudo-codeword in the 
fundamental cone: the one we selected has AWGNG pseudo-weight 13.65. Next, we laid 
the unit vectors A/ and A 2 such that the pairwise decision region boundary is the hyperplane 
defined by A/ = 0 and such that E[A | X=0] lies in the plane spanned by A/ and A 2 . Moreover, 
the unit vector A 3 has been chosen randomly such that it is orthogonal to A/ and A 2 . Given 
this setup, two slices are shown in Figs. 18 and 19, respectively. In both cases we compare 
SPA decoding (with max. 100 iterations) and LPD. Both plots indicate that the decoding 
regions of LPD give a very good “first-order” approximation of SPA decoding. 

Some hnal comments: 
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Figure 16: Decision regions for the code C described by the parity-check matrix H in Ex. 29. 
Left: Decision regions for MAPD/MLD (for any A 4 ). These are also the decision regions for 
GCD and LPD if A 4 ^ 0. Right: Decision regions for GCD and LPD if A 4 < 0. (The decision 
region D^( 5 ) is the square spanned by ( 2 A 4 , 0 ), (0, 2 A 4 ), (— 2 A 4 , 0 ), and (0,— 2 A 4 , 0 ).) 



Figure 17: Decision regions under iterative decoding for A 4 = —2 for the code C described 
by the parity-check matrix H in Ex. 29. For all simulated A-vectors 30 iterations were 
performed. The shade of the gray indicates the codeword decision; within the regions the 
light differences in the shade of gray indicate the convergence time. Note that in the middle 
square corresponding to T>^( 5 ) the decoders did not converge to a codeword. Left: Decision 
regions under MSA decoding. Right: Decision regions under SPA decoding. 
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Figure 18: Decision region plots for the [155,62] binary linear LDPC code in Ex. 30. Shown 
is a slice in the plane spanned by and A 2 with A 3 = 0. Observe that the A^-axis 
is stretched compared to the A 2 -axis. (See main text for more explanations.) Left: SPA 
decoding decision regions (max. 100 iterations). Right LPD decision regions (white: all-zeros 
codeword/black: non-zero (pseudo-)codeword). 



Figure 19: Decision region plots for the [155,62] binary linear LDPC code in Ex. 30. Shown 
is a slice with A 2 = 50 that is parallel to the plane spanned by and A 3 . Observe that the 
A'^-axis is stretched compared to the A 3 -axis. (See main text for explanations.) Left: SPA 
decoding decision regions (max. 100 iterations). Right LPD decision regions (white: all-zeros 
codeword/black: non-zero (pseudo-)codeword). 
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• Using the results of Ex. 12 we see that for a signal-to-noise ratio of E]^/Nq = 4.197dB we 
have E[A 2 | X=0] = 15.54, E[A 2 | X=0] = 50.00, and E[A' | X=0] = 0 for i G X \ {1, 2}. 
Moreover, y^VariA^T^C^^ = 2.90 for i G X. 

• Let us briefly comment on the white triangle in Fig. 18 in the rectangle 0 ^ A^^ < 1 and 
0 ^ A 2 < 10. It can easily be shown that for A in the vicinity of the 0, the SPA decoder 
can only decode successfully if A > 0. The above-mentioned white triangle corresponds 
to the region where A > 0 and where ||A ||2 is small. 

• Similar plots as in Figs. 18 and 19 can be obtained under MSA decoding. Similarly to 
SPA decoding, the closer A lies to the decision boundary lies to the decision boundary, 
the more iterations are necessary. However, simulations show that the number of re¬ 
quired iterations before convergence to the zero codeword increases much more in the 
case of MSA decoding. 


□ 

Without going much into the details, let us mention some connections of the fundamental 
polytope to concepts like the marginal polytope (and relaxations thereof), Bethe free energy, 
and the cycle/metric polytope in matroid theory. Marginal polytope: when translated to cod¬ 
ing theory, the marginal polytope [55] is the polytope spanned by all codewords, i.e. conv(C); 
the fundamental polytope is then a relaxation of this marginal polytope. Bethe free energy: 
consider the set of all possible vectors ({&Xi(a^i)}ieX(H)) of beliefs on the 

variable and check nodes of a Tanner graph. A vector in this set yields a smaller-than-infinity 
Bethe free energy [56] if and only if the sub-vector containing the beliefs ({&Xi(l)}iej(H)) 
corresponds to a point in the fundamental polytope. Cycle/metric polytope in matroid the- 
ory :26 cycle polytope of a binary matroid [57] is the polytope spanned by all codewords, 
i.e. conv(C). The metric polytope is then a certain relaxation of this cycle polytope. In 
fact, this relaxation equals 77r(H) in Def. 11 for r = j77(H)j and is therefore the fundamen¬ 
tal polytope of the parity-check matrix where all codewords of the dual code are included. 
Equivalently, it can also be seen as the intersection of all fundamental polytopes associated 
to all possible parity-check matrices for the given code. 

6 Definition and Properties of Pseudo-Weights 

After having seen different descriptions and properties of the fundamental polytope and cone, 
we turn our attention now to the question of “how bad” a certain pseudo-codeword is, i.e. we 
want to quantify pairwise error probabilities. Towards this end, let the pairwise error prob¬ 
ability between two codewords x and x' be the probability that upon sending the 

codeword x, MLD decides in favor of x' (assuming that only x and x' are competing at the 
decoder). Similarly, we let the pairwise error probability between a codeword x 

and a pseudo-codeword u be the probability that upon sending the codeword x, GCD/LPD 
decides in favor of uj (assuming that only x and uj are competing at the decoder). 

In the case of MLD of a binary code, the Hamming distance dH(x,x') = rcH(x^ “ x) 
between two codewords x and x' is sufficient to deduce the pairwise error probability 

^®Here is a small translation table from coding theory to matroid theory language: codes are binary matroids, 
codewords are cycles, and cycle codes are graphic binary matroids. 
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Figure 20: Decision regions under MLD when only the zero codeword is competing against 
the codeword x. (See text for more details.) 




Figure 21: Left: decision regions under GCD/LPD when only the zero codeword is competing 
against the pseudo-codeword cj. (See text for more details.) Right: same as left part, however, 
in order to obtain a setup similar to the MLD case in Fig. 20 we dehned uvin — ^ such 

that the decision hyperplane is at the same Euclidean distance from 7 • 0 and from 7 • oJvirt ■ 


when transmitting over an AWGNC, a BSG, or a BEG. However, in the case of GGD/LPD 
we need different measures for characterizing the pairwise error probability of 

a codeword x and a pseudo-codeword u. Therefore, in the following we will discuss the 
AWGNC, the BSC, and the BEG separately. 

6.1 AWGNC Pseudo-Weight 

We first consider the case of an AWGNC, where we will first study the MLD pairwise error 
probability and then the GCD/LPD pairwise error probability. So, let x' 7 ^ 0 be a codeword 
and define the random variable S' = (x'. A) — (0, A) = 3,/=! Aj. Knowing that the Aj’s 

are statistically independent given X = 0 (cf. Eootnote 14) and using the results of Ex. 12, 
we can easily find the distribution of A given X = 0, i.e. 

5"|x=o ~ A ^4i?^u>H(x'), ■ 
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Because MLD decides in favor of x' and against 0 when S' ^ 0 (cf. (19)), the pairwise error 
probability turns out to be^^ 

pMLD ^ I x=o) = q( \ ^Qf , (36) 

\^8R§^wni^')) VV ^0 J 

where Q{9) is as usual the integral from 0 to oo of the normal distribution with mean 0 and 
variance 1. We see that it is sufficient to know the Hamming weight of x' in order to compute 
the MLD pairwise error probability. (In the general case, we need only to know the Hamming 
distance between x and x' in order to compute 

Graphically, the pairwise error probability can be represented as follows. First, let 7 = 
= y/RE^ and 7 = 4^ = (note that 77 = 4^ = 4i2|fe). Secondly, define 

0 = 7 • (1 — 2 • 0) and x^ = 7 • (1 — 2 • x) (cf. Ex. 12). Fig. 20 shows the plane of the LLR space 
that contains the origin, the point 7 • 0, and the point 7 • x. (The point 7 • 0 corresponds 
to the LLR vector that is obtained at the receiver if X = 0 is transmitted and no noise is 
added.) Rewriting S' as 

S' = (x, A) - (0,A) = (x - 0,A) = (^0-x',^^ = ^ ( 7 ( 0 -x). A) (37) 

we see that S' is proportional to the projection of A onto the vector connecting 7 • x to 7 • 0, 
that 5^ = 0 on the line labeled “decision boundary”, and that <S" < 0 in the shaded area. It 
can easily be verified that the squared Euclidean distance from 7 • 0 to the decision boundary 
is 7 ^ • u;h(x). (The second-to-last inner product in (37) can be seen as doing the projection 
in signal space, i.e. A/( 27 ) is projected onto the vector connecting the signal space point x 
to the signal space point 0 .) 

In general, MLD results in a decision hyperplane that consists of all points that are equally 
far away from the two competing codewords and so the this hyperplane does not need to go 
through the origin. However, when using binary codes and BPSK signaling all signals have 
the same energy and so the decision hyperplane goes through the origin as in Fig. 20. 

Now we want to compute the pairwise error probability in the case of GCD/LPD. Let 
u G 'P(H) be a pseudo-codeword and define S = {u, A) — (0, A) = Yli&i ^i^i- Again, because 
of the statistical independence of the Aj’s given X = 0 we find that 



Because GGD/LPD decides in favor of u and against 0 when S' ^ 0 (cf. (20)), the pairwise 
error probability turns out to be^® 


= P(5^0 I X=0) 




Q 


/ 

V 


2R 


Ph 


(38) 


It was the idea of Wiberg [12] to define a generalization of the Hamming weight such that (38) 
looks formally like (36). 

case S' = 0 results in a tie. Depending on how ties are resolved, MLD might actually decide in favor 
of 0. However, P{S'=0 \ X=0) = 0. 

comment similar to Footnote 27 applies here. 


36 

















Definition 31 ([12, 30]) Let u> G IR". The AWGNC pseudo-weight of u is 

given by 


AWGNC 

Wp 


{u) 


A 




(39) 


where we define = 0 if u = 0. Wiberg [12, Ch. 6] called this quantity the 

“generalized weight”, whereas Forney et al. [30] called it the “effective weight”. (Note that in 
contrast to the Hamming weight, the AWGNC pseudo-weight is not a norm.) □ 

With this, Eq. (38) can be written as 


pGCD/LPD 


Q 


2R^w 

Nr. 


AWGNC 

P 


(o;) 


which indeed looks formally like (36). With suitable definitions, the general case 
can also be formulated by using a generalization of Hamming distance. However, in contrast 
to the Hamming distance, the resulting generalization of the Hamming distance will not be a 
distance in the mathematical sense. 

Similar to the MLD case we can also give a graphical interpretation of the decision regions 
in the GCD/LPD case. Fig. 21 shows the plane through the origin, the point 7 • 0, and the 
point 7 • X. Rewriting S as 


5^(u;,A)-(0,A) = (u;-0,A) 


(q-u, 



277 


( 7(0 - 


uJ),A 


(40) 


we see that S is proportional to the projection of A onto the vector connecting 7 • aJ to 7 • 0 , 
that 5 = 0 on the line labeled “decision boundary”, and that S' < 0 in the shaded area. 
(The second-to-last inner product in (40) can be seen as doing the projection in signal space, 
i.e. A/( 27 ) is projected onto the vector connecting the signal space point x to the signal space 
point 0.) In contrast to MLD, the two points 7 • 0 and 7 • oJ do not have the same distance 
from the decision boundary in general; in fact, it can even happen that the two points he on 
the same side of the decision boundary. Finally, note that the squared Euclidean distance of 
7 • 0 to the decision boundary is now given by 7 ^ • 'u;p^^^^(a;), which looks formally like the 
formula that we obtained in the case of MLD. 

It is clear that these geometrical observations can be connected to the discussion on linear 
programming at the end of Sec. 3; the details of this connection are left to the reader as an 
exercise. 


6.2 BSC Pseudo-Weight 

We first discuss MLD. Defining S' as in Sec. 6.1 for a codeword x' 0, we see that a necessary 
condition for 5'^|x=o to be non-positive is that the number of bit flips on the channel is at 
least ^ich(x'). The BSC pseudo-weight is defined such that we can formally make the same 
statement for GCD/LPD. 
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Definition 32 ([30]) Let a; G IR!J:. Let u' he a vector of length n with the same components 
as u but in non-increasing order. Introducing 

/(O -^'i {i-l<i^i, 

F{i)= fie) de, 

Jo 



the BSC pseudo-weight is defined to be = 2e.^® □ 

With this definition and S defined as in Sec. 6.1 we see that a necessary condition for 
5'|x=o to be non-positive is that the number of bit flips on the channels is at least Wp^^iu)/2. 
Note however that the BSC pairwise error probability formulas for GCD/LPD are not simply 
obtained from the BSC pairwise error probability formulas for MLD by replacing the Hamming 
weight by the BSC pseudo-weight. Namely, whereas in the case of MLD it only matters how 
many channel bit flips correspond to positions in supp(x'), in the case of GCD/LPD it not 
only matters how many channel bit flips correspond to positions in supp(a;) but also at which 
position these bit flips are. 

Another way to generalize the Hamming weight in the case of the BSC is given by the 
fractional and max-fractional weight. 


Definition 33 ([31]) The fractional and max-fractional weight of a vector G IR” are de¬ 
fined to be, respectively, 


^^^frac((^) = 




(41) 

(42) 


For u = 0 we define u)max-frac(^) — 0. We actually use a slightly different notation than [31]. 
Here, tCfrac o,nd tCmax-frac o,re defined for any vector in R”, whereas in [31], rcfrac and tCmax-frac 
already denote the minimum of these values over all nonzero vertices of the fundamental 
polytope. □ 


Fix some non-zero vector u G [0,1]”. Using the above definition, it can be seen that 
a necessary condition for S'|x=o to be non-positive is that the number of bit flips on the 
channel is at least ■ Similarly, fix some non-zero vector u G IR”. Then, a necessary 

condition for S'|x=o to be non-positive is that the number of bit flips on the channel is at 
least |tCmax-frac(^)- (The details of these two statements can be found in Sec. A.5.) 


6.3 BEC Pseudo-Weight 

We first discuss the MLD. Defining S' as in Sec. 6.1 for a codeword x' / 0, we see that 
a necessary condition for 5^|x=o to be non-positive^^ is that the number of erasures on the 
channel is at least rcH(xO- The BEC pseudo-weight is defined such that we can formally make 
the same statement for GCD/LPD. 

^®Note that the quantity e is obviously related to the median of the “pdf” given by f{^)/F{n). However, 
let us remark that this is a different “distribution” than used later on in Lemma 39 when characterizing the 
AWGNC pseudo-weight. 

®°Because of special properties of the BEC, S can never be negative. 
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Definition 34 ([30]) Let u G IR”. The BEC pseudo-weight Wp^*^(cj) is defined to be 

= |supp(c^)|. 


□ 

With this definition and S defined as in Sec. 6.1 we see that a necessary condition for 
5'|x=o to be non-positive is that the number of bit flips on the channels is at least 
In contrast to the BSC, the BEC pairwise error probability formulas for GCD/LPD are 
simply obtained from the BEC pairwise error probability formulas for MLD by replacing the 
Hamming weight by the BEC pseudo-weight. (Note that the exact formulas depend on how 
ties are resolved.) 

6.4 Pseudo-Weight Properties 

This section collects different lemmas that characterize the different pseudo-weights and the 
fractional and max-fractional weights. 

Lemma 35 The AIVGNC, BSC, and BEC pseudo-weights and the max-fractional weight are 
invariant under scaling by a positive scalar, i.e. 


,AWGNC/„ , 

,,, AWCNC 

’p (« • 

= Wp 

BSC/ 

BSC/ 

Wp [a ■ uj) 

= Wp [Uj) 




'^^max—frac(® ' ^) — '^i’max—frac(^)) 

for any a G IR++ and any ca G IR" . Note that the fractional weight is not scaling-invariant. 
Proof: Follows easily from the definitions. □ 

Lemma 36 If ca ^ {0,1}"' then the AWCNC, the BSC, and the BEC pseudo-weights and 
the fractional and max-fractional weight reduce to the Hamming weight, i.e. = 

etc. 

Proof: This is straightforward. E.g. in the case of an AWCNC the result follows from ob¬ 
serving that ||cj||^ = W}i{uj) and that ||ta ||2 = wii{uj) which implies that r(;p^^^^(a^) = 
||a;||^ / ||u^||2 = /w}i{ia) = D 

The following definitions generalize the notion of the minimum Hamming weight of a 
binary linear code. 

Definition 37 The minimum AWCNC, BSC, and BEC pseudo-weight and the minimum 


39 



fractional and max-fractional weights are defined to be, respectively, 


,AWGNC,min 

(H)4 

min 

AWGNC ( 

P 

a;GV()P(H))\{0} 

P ^ 

y^BSC,min 

(H)4 

min 


P 

a;GV()P(H))\{0} 

^BEC,min 

(h)4 

min 


P 

a;GV()P(H))\{0} 

min 

^frac 

(h)4 

min 

W^frac(<^), 

a;GV()P(H))\{0} 

f 

^max—irac 

(H)4 

min 

^max—frac 

a;GV()P(H))\{0} 



where V('P(H)) \ {0} is the set of all non-zero vertices of the fundamental polytope 'P(H). □ 

It is important to note that the above minimal weights depend on the choice of parity- 
check matrix H, i.e. different parity-check matrices for the same code can lead to different 
minimal weights. This is in contrast to the minimal Hamming weight of a code which is 
independent of the specific choice of parity-check matrix by which a binary linear code is 
represented. 


Lemma 38 


,AWGNC,min 

= min u;AWGNC(^) 

= min 

AWGNC ( 

P ^ ‘ 

weP(H)\{o} P 

a;G/C(H)\{0} 

P ^ 

BSC,min(jj) 

= min 

= min 


P ^ ' 

a;eP(H)\{0} P 

a;G/C(H)\{0} 


BEC,min(jj) 

= min 

= min 



a;eP(H)\{0} P 

a;G/C(H)\{0} 


_,,min /TT^ 

^max—frac 

= min 'lCmax-frac(^) 

= min 

^max—frac 


cj€P(H)\{0} 

u^G/C(H)\{0} 



Note that there is no such statement for the fractional weight. 

Proof: These are simple consequences of the fact that the AWGNC, BSC, and BEC pseudo¬ 
weights and the max-fractional weight are scaling-invariant, that Lemma 41 holds, and that 
/C(H) \ {0} = conic(P(H)) \ {0}. □ 


In the following, our standard channel will be the AWGNC. Therefore, when nothing else 
is specihed, pseudo-weight will mean AWGNC pseudo-weight and we will write Wp{ij) and 


rt;™“(H) instead of and w 


AWGNC,min, 


H), respectively. 


Lemma 39 Let cj G IR" and let S = supp(a;) be its support. Consider the non-zero en¬ 
tries of uj to be |5| samples of a positive random variable Ll. Introducing the empirical 
first moment (mean) E[n] = (1/|5|) = (1/|5|) the empirical second moment 

E[n^] = (1/|5|) ~ (l/l‘^l) 11^112? empirical variance Var[n] = E[II^] —(E[II])^, 

we can rewrite the AWGNC pseudo-weight as 


Wp{u) 


(EP)^ 

E[Sd2] ■ 


(43) 
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( 44 ) 


In the case that u is scaled such that E[0] = 1 (i.e. ||uj||;^ = we can write 

1 


Wp{u:) = | 5 | 


Yav[Q] + 1 


Therefore, the more the non-zero components of u are apart, the smaller is the AWGNC 
pseudo-weight. 

Proof: See Sec. A.6. □ 


Lemma 40 Let a; £ IR” and let Z(cj, 1) be the angle between the vectors u and 1. Interest¬ 
ingly, Wp{u) is only a funetion of n and the angle Z(ca, 1 ).' 

t(;p(a;) = n • cos (Z(cj, 1))^ (45) 

We see that the larger the angle Z(a;, 1) becomes, the smaller is Wp{uj). Alternatively, if we 
let be the indicator vector of u, i.e. the i-the position is 1 if tOi is non-zero and it is 0 
otherwise, then 

t(;p(a;) = | supp(a;)| • cos (Z(a;, 1^;))^ (46) 

Proof: See Sec. A.7. □ 

Lemma 41 For any positive integer L, let be a set of vectors where £ IR”, 

iG[L]. Then, 


w 


AWGNC 


W. 


BSC 


W- 


BEC 




min w- 

ee[L] ■ 


AWGNC 


mm w- 
£e[L] 


BSC 




min 
£e[L] P 


Wr, 


^-frac A mmii;max-frac(<^^^^), 


for any ai Q, i £ [L] where not all Oi are zero. This means that the AWGNC pseudo¬ 
weight of any conic combination of an arbitrary set of vectors in IR” is at least as large as the 
smallest AWGNC pseudo-weight of any of these vectors. This property is intuitively clear from 
the geometrical meaning of the AWGNC pseudo-weight. (Similar statements can be made for 
the BSC and BEG pseudo-weight and for the max-fraetional weight.) 


Proof: See Sec. A.8. 


□ 
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Lemma 42 For any positive integer L, let be a set of vectors where G IR!J:, 

i G [L]. If = 1 for all I G [L] then 


_i^ 

teiL] flfFF) 

for any ct^ ^ 0, ^ G [L], such that Y1 iI^[l] 

Proof: See Sec. A.9. 


Lemma 43 Let a; G IR+. Then 


d 

duJi 


Wp{uj) < 


> 0 
= 0 
< 0 


ifuji < ||cj||^ /wpi^^) 
ifiOi = ||cj||^ /wpii^) 
ifuJi > ||cj||^ /w^{^) 


(47) 


□ 


Proof: See Sec. A. 10. □ 

Roughly speaking, the above lemma means that if we are given a vector G IR” and want 
to decrease its AWGNC pseudo-weight then we must either decrease the small components 
or increase the large components. In both cases the empirical variance increases which is in 
agreement with the observations in Lemma 39. 

Lemma 44 Letu G IR” with 0 ^ ^ 1. Remember that Wp{u) = by definition. 

Then 

w^frac(<^) ^ W'max-frac(^) < Wp{uj) ^ (48) 

l'^frac(<^) ^ W'max-frac(^)^ 'W;p®'^(ca) ^ '^^‘^(cj), (49) 

and 

«,C(H) < ,5„) 

Proof: See Sec. A. 11. □ 

Note that there is no hierarchy between Wp{u) and t(;p®^(a^), i.e. one can find u’s such 
that either one is larger. Consider for example u> = (1, 1, ^) for which the AWGNC 

pseudo-weight is larger: Wp{u:) = ^ = ^ = 5.333 > r(;p^‘^(a;) = 2-2 = 4. However, the 
vector u = (1, |,..., |) of length 65 is an example where the BSC pseudo-weight is larger: 
Wp{u) = ^ = 57.8 < tCp®‘^(<^) = 2 • 31 = 62. 

Asymptotically, i.e. for n —> oo, the AWGNC and BSC pseudo-weight can vary drastically 
in the following sense. In Prop. 49 we will show that Wp{-) always grows sub-linearly for an 
ensemble of (rccoi, rCrow)-i’egular LDPC codes where 3 ^ tCcoi < Wrow However, for properly 
chosen families of (rCcoi, ?^row)-regular LDPC codes one can guarantee a linear behavior of 
(■) as n —>■ oo [58]. Some of the reasons and implications of this fact are also discussed 
in [59]. 
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The above considerations have also implications for the fractional and max-fractional 
weight (see Def. 33) that was introduced in [31] to analyze the decoding behavior when 
transmitting over a BSC. Using Lemma 44 we see that when considering the limit n ^ oo the 
fractional and the max-fractional weight can grow at best like the AWGNC pseudo-weight. 
However, the comments in the previous paragraph show that the AWGNC and BSC pseudo¬ 
weight can behave quite differently for n —> oo, therefore the fractional/max-fractional weight 
and the BSC pseudo-weight can also behave quite differently for n —> oo. Note though that 
from an analysis point of view, the fractional weight might sometimes be a more manageable 
quantity since it is a linear function of the argument whereas the BSC pseudo-weight is more 
complicated function. Indeed, [31, Sec. 4.4.3] shows an efficient procedure for computing the 
minimal fractional weight of a code with given parity-check matrix. 

7 A Simple Upper Bound on the Minimum AWGNC Pseudo- 
Weight 

In this section we investigate the asymptotic behavior of the minimum pseudo-weight of 
families of (rCcoi, u^row)-i'egular LDPC codes, i.e. codes whose parity-check matrices have a 
fixed column and row weight.Our main result will be that the relative^^ minimum AWGNC 
pseudo-weight of any {wcoi, ?^row)-regular code, 3 ^ Wcoi < w^ow, approaches zero as n —> oo, a 
behavior which is in sharp contrast to the observation made by Gallager [2] that the relative 
minimum Hamming weight of a randomly generated (rCcoi, u’row)-regular LDPG code, 3 ^ 
Wcoi < w^row) is lower bounded by a nonzero number with probability one for n —> oo . 

In the following, we associate the Tanner graph T = T(H) to the parity-check matrix H 
and denote its girth and diameter by ^(T) and (5(T), respectively. 

Definition 45 Let T be a Tanner graph of an arbitrary code (not necessarily {wco\,Wrow)- 
regular). We let an arbitrary variable node V ofT to be the root. We classify the remaining 
variable and check nodes according to their (graph) distance from the root, i.e. all nodes at 
distance 1 from the root will be called nodes of tier 1, all nodes at distance 2 from the root node 
will be called nodes of tier 2, etc. We call this ordering “breadth-first spanning-tree ordering 
with root V. ” Because of the bipartite-ness of T, it follows easily that the nodes of the even 
tiers are variable nodes whereas the nodes of the odd tiers are check nodes. Furthermore, a 
check node at tier 2 t -|- 1 can only be connected to variable nodes in tier 2t and possibly to 
variable nodes in tier 2t-\-2. Note that the last tier is tier (5(T) and that the symbol nodes are 
at tiers 0,2,... ,2ld{T)/2\. □ 

Let us upper bound the number of nodes for each tier when we perform breadth-first 
spanning-tree ordering according to Def. 45 with respect to an arbitrary node V of the Tanner 
graph T of an arbitrary (rccoh ^i'row)-regular LDPC code. Let Nv^tfL) be the number of nodes 
at tier t and let Nyf^ = Nyf^ ^ be the maximal number of nodes possible at tier t for any 
(tCcoi, 'U’row)-regular LDPC code. It is not difficult to see that Ny^^ = 1, = Wcoi, = 

W'col(w^row - 1 ), = 'W^col(w^row “ l)(w^col “ 1 ), = 'Wcol(^^^row “ l)(^^^col “ l)(w^row “ !)• In 

general, = r(^coi('(ncoi - l)*“^(';nrow - 1 )* for t > 0 and Ny^^t-ei = ^coiiwcoi - l)‘(w^row - 1 )‘ 

for t ^ 0. 

Although similar methods can be devised for irregular LDPC codes, we focus on the regular case only. 

®^In the same way as the relative Hamming weight of a vector is the Hamming weight of the vector divided 
by n, we can define relative pseudo-weights for all the pseudo-weights that were introduced in Sec. 6. 
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Figure 22: Left: Tanner graph for the [7,4,3] code in Ex. 47. Middle: Canonical completion 
with respect to node Xi. Right: Another pseudo-codeword. 


Definition 46 Let T be the Tanner graph of a code whose parity-check matrix H has uniform 
row weight tCrow After performing the breadth-first spanning-tree ordering with an arbitrary 
variable node V as root we construct a pseudo-codeword u in the following way. If bit i 
corresponds to a variable node in tier 2t, then 


{Wro. - ly 

We call this the canonical completion with root V. It will be shown in Lemma f8 that u £ 
/C(H), i.e. uj is a pseudo-codeword. □ 


(52) 


Example 47 Fig. 22 (left) shows the Tanner graph of a [7,4, 3] binary linear code. (It is the 
length-7 Hamming code.) Note that in this Tanner graph, all check nodes have degree four, 
i.e. tCrow = 4:. Performing breadth-first spanning-tree ordering with root Xi we see that tier 
0 consists of {Aii}, tier 2 consists of {X 4 , Xq, Xf}, and tier 4 consists of {X 2 , X^, X^}. Cor¬ 
respondingly, the canonical completion with root Xi yields the vector u = (l, 4, 4, 4, 4, 4, 4) 
shown in Fig. 22 (middle). It is easy to check that to is inside the fundamental cone for this 
graph and is therefore a pseudo-codeword. The AWGNC pseudo-weight for to equals 


rcp(cj) 


1 + ^ + ^ + 1 + ^ + i + i 

^81 ^81 ^ 9 ^ 81 ^ 9^9 


3.973. 


(As an aside, we note that the Tanner graph in Fig. 22 (left) also supports a pseudo-codeword 
Lo' of type Lo' = (1, 0,0, 0, |) whose AWGNC pseudo-weight equals only three and is thus 

at “minimum distance” for this code, see Fig. 22 (right).) □ 

Without going into the details, let us mention that Def. 46 can be generalized in the 
following way: instead of doing a canonical completion with respect to a single variable node, 
one might do a canonical completion with respect to a set of variable nodes. The entries of 
the pseudo-vector will then be defined according to the graph distance to this set of nodes. 
This generalized notion of canonical completion was e.g. used in [60, 50]. 


Lemma 48 Let T be the Tanner graph of a code whose parity-check matrix H has uniform 
row weight w^w- The canonical completion with an arbitrary codeword symbol node V as root 
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yields a vector u such that cj is in the fundamental cone /C(H). The vector lj has AWGNC 
pseudo-weight Wp{u) = ||a;||^ / ||a^|| 2 , where 


L<5(T)/2J 

1^111 = X ] 

t=0 

L<5(T)/2J 

1^112= E 

t=0 


{Wrow - 1)*' 


(l^row - 1)* 


(53) 

(54) 


Proof: See Sec. A. 12. □ 

For a given T, one can numerically calculate the pseudo-weight of the pseudo-codeword 
given by the canonical completion for any given root; this will always yield an upper bound 
on 'u;“™(C). In the next proposition we will see that the canonical-completion approach is 
powerful enough to show that rc™™(C) can at best only grow sub-linearly for (rCcob’U^row)- 
regular LDPC codes with 3 ^ Wcoi < li’row 


Proposition 49 Let H be the {wcoi,'Wmw)-i"^ 9 ular parity-check matrix of a length-n LDPC 
code C with 3 ^ Wcoi < Wrow Then the minimum pseudo-weight is upper bounded by 


<“(H)^/3'.n^ 


(55) 


where 

(3' = /3'(?i'col, Wrow) = 

Proof: See Sec. A. 13. 


Wool (Wcoi - 1 ) 

Wcoi 2 


[3 — /l(Wcob Wrow) — 


l0g((Wcol - 1)^) 
log ((Wcol - l)(Wrow - 1)) 


< 1 . 


(56) 

□ 


Note that this proposition excludes two type of (wcoi, Wrow)-i’egular codes. The first type 
is the family of codes where Wcoi = 2, also known as cycle codes. In that case a much better 
upper bound can be given: the minimum distance, and therefore also the minimal AWGNC 
pseudo-weight, grow at best only logarithmically in the block length n. 

The second type of codes that where excluded were families of codes where Wcoi ^ Wrow 
Note however that randomly generated (wcoi, Wrow)-regular LDPC codes are not too interest¬ 
ing since the dimension of the code will be zero or near-zero with high probability. Neverthe¬ 
less, let us mention that there are interesting and practically useful families of algebraically 
constructed (wcoi,'u^row)-i’egular codes where the rate does not vanish, e.g. [61]. 

Corollary 50 Consider a sequence of {wcoi,Wrow)-i"egular LDPC codes, 3 ^ Wcoi < Wrow; 
whose length goes to infinity. The relative minimum AWCNCpseudo-weight (i.e. the fraction 
of minimum pseudo-weight to code length) must go to zero. This is in sharp contrast to the 
fact that the relative minimum Hamming weight of a randomly generated {wco\-,Wrov!)-regular 
LDPC code, 3 ^ Wcoi < Wrow, is lower bounded by a nonzero number with probability one for 
n ^ oo [2]. 

Let us hnish this section with two observation. The hrst observation is about the “strange” 
shape of the fundamental cone. Using Lemma 40 we see that Prop. 49 says that for families of 
(wcoi, Wrow)-regular LDPC codes there are pseudo-codewords (i.e. vectors in the fundamental 
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cone) whose angle with the all-ones vector goes to 90° for n —> oo. However, none of the 
polytopes associated to this family of codes contains the vector (wrow ~ 1 + j 1); where 

e > 0, yet the angle of this vector with the all-ones vector goes to 0° for n ^ oo. 

The second observation is that the BEC pseudo-weight of the canonical completion with 
respect to any variable node equals the block length. This means that although the fundamen¬ 
tal cone characterizes the pseudo-codewords for the AWGNC and the BEC, the worst-case 
pseudo-codewords within the fundamental cone might be quite different depending on the 
channel. 

8 The Relationship of the Fundamental Polytope to other 
Concepts that Explain the Behavior of Iterative Decoding 

As we mentioned in the introduction to the paper, a variety of concepts have been introduced 
in the past that try to explain the behavior of MPID. In this section we would like to show 
how some of these are related to the fundamental polytope and the various pseudo-weights. 

8.1 Stopping Sets 

Let us recall the definition of a stopping set [22] for a Tanner graph T. A subset S of the 
variable nodes of T is called a stopping set if and only if every check node in d(S) is connected 
to at least two variable nodes in S. Stopping sets are a means to understand the suboptimal 
behavior of iterative decoding techniques for the BEC, in fact they completely characterize 
iterative decoding in that case. It has been observed later that stopping sets seem to also 
reflect, to some degree, the performance of iteratively decoded codes for other channels. 

Proposition 51 On the one hand, i/cj G T’(H) then supp(a;) is a stopping set o/T(H). On 
the other hand, if S is a stopping set o/T(H) then there exists a vector uj G T’(H) such that 
supp(a;) = S. 

Proof: See Sec. A. 14. □ 

In the light of Prop. 51 it seems quite intuitive that the BEC pseudo-weight of a vector 
u G 'P(H) is dehned to be = |supp(a;)|, see Def. 34, but we will not go into the 

details here. 

While the notion of stopping set is well suited to the BEC it is not refined enough to 
capture the situation for the AWCN channel. Consider the parity-check matrix H whose 
Tanner graph T = T(H) is shown in Fig. 23 and whose fundamental cone is 

/C(H) = {ai • (2, 2,1,1) + 02 • (1,1,2, 2) | oi, 02 e K+}. 

While all the non-zero vectors in /C(H) have BEC pseudo-weight 4 (i.e. their supports yield 
stopping sets of size 4), the AWCNC pseudo-weight is usually smaller than 4, e.g. the two 
minimal pseudo-codewords (2, 2,1,1) and (1,1, 2, 2) have AWCNC pseudo-weight 3.6. 

8.2 Near Codewords and Trapping Sets 

Near-codewords were introduced by MacKay and Postol [23]: a vector x G F 2 ^s called a 
{w,w^) near-codeword in a Tanner graph T = T(H) with n variable nodes if rcH(x) = w 
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Figure 23: Tanner graph T. 


and rr’H(s) = w' where s = x • (in F 2 ) is the syndrome of x with respect to H. In other 
words, the graph induced by the w non-zero components of x contains w' check nodes of odd 
degree. Richardson’s definition of trapping sets is essentially identical [24]; x G F 2 is a {w, w') 
near-codeword if and only if supp(x) is a {w,w') trapping set. 

As was remarked in [23]: “near codewords with small w' tend to be error states from which 
the sum-product decoding algorithm cannot escape.” Therefore it is important to understand 
the {w,w') near-codewords that have low w and low w'. To exemplify this with a simple, 
albeit extreme, example, consider an LDPC code C represented by a parity-check matrix H. 
Fix some i' G T and let x G F 2 be a vector where Xj/ = 1 and Xi = 0 for i G T \ {z'}. It is 
easy to check that x is a near-codeword where w' equals the Hamming weight of the i' 

column of H. In fact, it can cause problems when transmitting over an AWGNC. Assume that 
the all-zeros codeword is transmitted {+^/Ec\. after modulation) and that the noise vector is 
the all-zeros vector except for the z'-th position that is negative. If it is negative enough then 
MPID will decide wrongly. 

A connection between near-codewords and trapping sets on the one hand and pseudo¬ 
codewords on the other hand can be made in the following way. One way is to hnd the 
pseudo-codeword in the fundamental cone that is the closest to a {w, w') near-codeword x. If 
w' is small, only small changes have to be applied to the components of the vector x to get a 
pseudo-codeword. Alternatively, when trying to assign a pseudo-codeword to a near-codeword 
one might want to apply the canonical completion that is rooted at the near-codeword. 

8.3 Why Four-Cycles are Potentially Bad 

Already people like Wiberg realized that for MPID to work well one should have Tanner 
graphs that look locally tree-like which means that the girth of a graph should be reasonably 
large. A first step in that direction is to avoid four-cycles.^^ In this subsection we would like 
to explore what the fundamental-polytope view can contribute to this topic. 

A simple observation towards this goal is the following: considering the proof of Prop. 49 
we see that the smaller the girth of the graph is the smaller can be made the AWGNC 
pseudo-weight of the canonical completion. 

^^Note though that some researchers have studied algebraically-constructed Tanner graphs with girth four, 
see e.g. [62, 63, 64], and exhibited some codes which work very well under iterative decoding. 
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A different avenue is pursued by the following lemma and its corollaries which explore the 
effect of girth on the fundamental polytope upon adding redundant rows to a parity-check 
matrix. 

Lemma 52 Let C he a code with parity-check matrix H. Basic coding theory tells us that the 
modified parity-check matrix 


H' 


H 

a H 


(in ¥ 2 ), 


where a E F 2 ^ is an arbitrary vector, defines the same code C. If the Tanner graph T(H) 
o/H is a forest, i.e. cycle-free, thenViTi) =V{¥i'). 

Proof: See Sec. A. 15. □ 

Note that in the absence of cycle-freeness of T(H) one can easily exhibit a vector a where 
iP(H') C V{U). 

Corollary 53 Similar to Lemma 52, consider a code C with parity-check matrix H and a 
modified parity-check matrix H', where a. ^ ¥2 is an arbitrary vector. However, now we do 
not require that T(H) is a forest. Let Hi he the |supp(a)| x n submatrix o/H where we 
include the j-th row if and only if Oj 0. If the Tanner graph T(Hi) of Hi is a forest, 
i.e. cycle-free, thenVfH.) ='P(H'). 

Proof: See Sec. A. 16. □ 

Corollary 54 Let C be a code with parity-check matrix H. Basic coding theory tells us that 
the modified parity-check matrix 


h'^(a“h) 

where A is an arbitrary matrix over F 2 with |J"(H)| columns, defines the same code C. For 
each row r of A, let H^ he the submatrix o/H where we include the j-th row o/H if = 1. 
//T(H,) is a cycle-free Tanner graph for all rows r of A, then 'P(H) = 'P(H'). 

Proof: See Sec. A. 17. □ 

Lemma 52 and its corollaries have some important consequences.^"^ 

• Let H be a parity-check matrix of a code C where the Tanner graph T(H) has girth 
six. We can create a new parity-check matrix H' that describes the same code in the 
following way: let H' consist of all rows of H and the modulo-2 sums of all pairs of 
rows of H. Then 'P(H) = 'P(H'). (This observation follows from the fact that girth six 

for T(H) implies that )) is cycle-free for all pairs of rows of H.) Note that 

applying the same procedure to Tanner graphs T(H) with girth four will usually lead 
to iP(H') C p(H). 

^"^Similar observations were also made by Wainwright [65]. 
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• More generally, let H be a parity-check matrix of a code C where the Tanner graph 
T(H) has girth g. We can create a new parity-check matrix H' that describes the same 
code in the following way: let H' consist of all rows of H, the modulo-2 sums of all 
pairs of rows of H, ..., the modulo-2 sums of all (g — 2)/2-tuples of rows of H. Then 
iP(H) =iP(H'). 

• The above observations have some interesting consequences for as defined in 

Def. 11: if T(H) has girth g then 7^r(H) = 'P(H) for r ^ {g — 2)/2. This means that 
the larger the girth of the Tanner graph T(H) is, the more codewords from the dual 
code have to be added to the parity-check matrix so that the fundamental polytope 
changes. Parity-check matrices whose Tanner graphs have large girth therefore possess 
a good complexity-approximation tradeoff: it takes much more effort to get a better 
approximation of conv(C). 

The above considerations show that large girth seems to be a desirable design criterion 
when construction LDPC codes. This supports for example the type of random LDPC code 
constructions as presented by Hu et al. in [66]. It is certainly also a desirable criterion when 
designing algebraically constructed LDPC codes, nevertheless one has to be careful beyond 
having simply a large girth: a Tanner graph with a cycle structure that is ’’too nice” can lead 
to either low-weight codewords (which is very bad) or low-weight pseudo-codewords (which 
might potentially be detected and avoided in a decoder). E.g. in the case of the Margulis 
construction with Ramanujan graphs one has large girth but also a minimum distance of 24 
for n = 4896 [67, 23]. Obviously, adding any possible better constraints does not help as 
this minimum codeword will always be included. Although the original Margulis codes [6] 
do not seem to have low-weight codewords they exhibit some near-codewords [23]. These 
near-codewords might be avoided using better relaxations. 

Another word of caution: when adding redundant rows to a parity-check matrix it is clear 
that the decoding performance of GCD and LPD can only become better. A question remains 
as how far GCD is still a good model of MPID when the parity-check matrix contains many 
more rows than columns. (Some initial explorations in this direction were presented in [68].) 

9 Conclusions 

We have introduced graph-cover decoding, a theoretical tool that helps to establish a bridge 
between linear-programming decoding and message-passing iterative decoding and explains 
why they perform similarly. The central object behind these decoding algorithms is the 
fundamental polytope which is a function of the graphical representation of the code (and 
not of the channel). Therefore, different representations of the same code yield (potentially) 
different fundamental polytopes. Vectors inside the fundamental polytope are called pseudo¬ 
codewords and their influence is measured by the pseudo-weight, a function that depends on 
the pseudo-codeword and the channel law. For all the cases where the behavior of message¬ 
passing decoding is known analytically, the graph-cover decoder gives the correct predictions 
and for the other cases the graph-cover decoder seems to be a good model of the behavior of 
message-passing decoding. Moreover, there are connections to Bethe free energy, the marginal 
polytope, and the metric polytope. 

Some of the questions for future research that should be addressed are as follows. First, 
given a code and its representation, what analytical and computational tools can be used to 
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characterize the fundamental polytope? (Some initial work in this direction was presented 
in [69, 70] where a lower bound on the AWGNC pseudo-weight was given.) Secondly, how can 
one construct codes on graphs whose fundamental polytopes have good properties? Thirdly, 
one can always change the Tanner graph of a code, e.g. by repeating a check many times, so 
that the fundamental polytope and therefore also the linear programming decoding perfor¬ 
mance remains the same whereas the iterative decoding performance will change. So, up to 
what degree is the graph-cover decoding a good model for message-passing decoding? (Some 
initial work in this direction was presented in [33] and [68].) 

A Proofs 

This appendix contains a variety of proofs that were used in the main text. 

A.l Proof of Proposition 10 

We prove Prop. 10 in three major steps. First, Lemma 55 will show that Q(H) is a subset of 
'P(H). Secondly, Lemma 56 will prove that if a point in 'P(H) has only rational entries then 
it must also be in Q(H). Thirdly, Lemma 58 will prove that all vertices of 'P(H) are vectors 
with rational entries. Eq. (14) is then a simple consequence of these first two lemmas, (15) 
is a simple consequence of (14), and the statement that all vertices of 'P(H) are in Q(H) is 
a consequence of the third lemma. 

Lemma 55 It holds that 


Q(H) C P(H). 

Proof: Let T be any M-fold cover of T(H) and let C = C(T). Because of (11), if we can show 


that u;(x) G conv(Cj) for all x G C(T) and for all j G 77 we are done. Fix some x G C(T) 
and some j G 77. As we saw in the remarks after Ex. 4, the Tanner graph T defines some 
permutations for all i G Ij and so x fulfills 



i&Xj 


for all m G [M]. In order to simplify the following expressions, let us introduce some dummy 
permutations TTj^i for all i G I\Zj. Then, for m G [M], let us define the vectors G IR"' 

with 



for all i G T. Rewriting (57) as 



i&Xj 


we see that x'^™^ G CjfH.) for all m G [M], A convex sum of these M vectors must obviously 
lie in conv(Cj(H)): 



mG[M] 
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Observing that the i-ih. position of the left-hand side in (59) takes on the value 


^ M * M ^ * M ^ 

me[M] me[M] me[M] 


1 

M 


m'e[M] 


Wi(x), 


(60) 


we conclude that ^(x) G conv(Cj(H)). Because T, x, and j were arbitrary, this finishes the 
proof. 

Note that when 'P(H) contains more than one point in IR” then the subset relationship 
between Q(H) and 'P(H) is strict: Q(H) C 'P(H). To prove this, simply choose a point u) in 
'P(H) where at least one component is irrational: because all points in Q(H) have rational 
components it follows that u ^ Q(H). (Note that the case where 'P(H) contains only one 
point in IR” can only happen for block length n = 1 and parity-check matrices like H = (1).) 
□ 


Lemma 56 If a point in P(H) has only rational entries then it must also be in Q(H). 

The main part of the following proof will consist of an algorithm; Ex. 57 (which can be 
found in the text after this proof) illustrates the involved concepts with the help of a code 
that we have already used earlier on. 

Proof: We will prove this lemma as follows: for an arbitrary point i' G T’(H) n £)"■ we will 
show that there is an M-cover T of T(H) such that we can exhibit a codeword x G C(T) such 
that uj(x) = u. 

So, let u G T’(H) n £)"■. Because i/ G 'P(H) we have v> G conv(Cj(H)) for j G J. Using 
Caratheodory’s Theorem (see e.g. [40, p. 10]), we can conclude that for all j € II we can write 

U = Q(J)PO'), 


where is an (n -R 1) x re matrix where the rows represent some vertices of conv(Cj(H)), 
i.e. codewords of Cj(H), and where is a vector of length re-Rl where all entries are nonzero 
and sum to one. For each j G 77 these statements can be reformulated to 

{u 1) = (P(J) l"^) . 


This is a system of re -R 1 equations with re -R 1 unknowns. Because iz G Q” and because all 
entries of P^-^^ are either 0 or 1, we can conclude with the help of Cramer’s rule for solving 
systems of linear equations (see e.g. [71]) that all entries of must be rational. 

Now we proceed to construct a finite cover T of T(H) and a codeword x G C(T). Let M 
be a common denominator of all the entries of all the vectors j G 77: from this we have 
that not only G Z”, j G 77, but also that Mv £ Jlf. The graph T shall be an M-cover 

of T(H) with symbol nodes {i,m) G X x [M] and check nodes £ 77 x [M]. 

The entries of the codeword x shall be 


^i,m — 



(i G X, m G [Mvi]) 
(otherwise) 


It now remains to specify the connection pattern of T, i.e. what symbol node is connected to 
what check node. Once this pattern is specified, it will be easy to see that T is indeed an 
M-cover of T(H) and that x is a codeword in C(T). We use the following algorithm: 
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Figure 24: Left: Tanner graph T(H) of the simple binary linear code in Ex. 4. Right: 3-cover 
of T(H) as found in Ex. 57. The shading of the symbol nodes indicates the codeword found 
in this example. 


• Eor all j ^ J do: 

— Let mj = 1. Eor all i € Ij, let m^ = l and m'- = Mi'i + 1. 

— Eor I from 1 to n -|- 1 do: for s from 1 to Mor^ do: 

* Eor all i e Tj do: 

• If = 1 then connect to and let m[ = m'- + 1. 

• If = 0 then connect to and let m'l = m'l + 1. 

* Let mj = mj -|- 1. 

We leave it to the reader to check that this construction indeed yields the desired graph cover 
and codeword. 

□ 


Example 57 We continue Ex. 4. In Exs. 5 and 7 we saw that the vector u = (|, |,0) is 

a pseudo-codeword. Let us show how the algorithm in the proof of Lemma 56 handles this 
vector. Eirst of all, we must check that v G T’(H) n Q”. This is indeed true. Next, we have 
to find the matrices P^^^ and p(^). Note that the codes Ci and C 2 are the sets 


r (0,0,0,0)'I 
( 0 , 0 , 0 , 1 ) 



'(0,0,0)' 



(0,1,1,0) 


0 ^ 

o' 

0 *' 

Cl = < 

(0,1,1) 

(1,0,1) 

> X 

{S} ^ ■ 

(0,1,1,1) 
(1,0,1,0) 


(0,1,1) 

(1,0,1) 


Ul,l,0)J 



(1,0,1,1) 


[{hhO)) 


( 1 , 1 , 0 , 0 ) 
l(i,i,o,i)J 


f ( 0 , 0 , 0 , 0)1 
( 0 , 0 , 1 , 1 ) 
( 0 , 1 , 0 , 1 ) 
( 0 , 1 , 1 , 0 ) 
( 1 , 0 , 0 , 0 ) 
( 1 , 0 , 1 , 1 ) 
( 1 , 1 , 0 , 1 ) 
l(l,l,l,0)J 
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For the given vector u it turns out that u = and u = with^^ 


(O 

0 

1 

3 

1 

3 

1 

3 


= 


0 

0 

0 



0 

0 

0\ 




0 

0 

0\ 


0 

0 

0 

1 



0 

0 

1 

1 

p(l-) — 

0 

1 

1 

0 


p(2) _ 

0 

1 

0 

1 


1 

0 

1 

0 



0 

1 

1 

0 


VI 

1 

0 

0/ 



Vi 

1 

1 

0/ 


Note that the first two lines of P^^^ and the three middle lines of P^^^ are dummy lines so 
that and have n + 1 = 5 entries. 

We see that M = 3 is the smallest common denominator of all the entries in and cx^'^\ 
therefore let us find a 3-cover of T(H) that has a codeword x S C(T) such that u;(x) = u. 
Applying the rest of the algorithm in Lemma 56 we find the 3-cover graph T in Fig. 24 (right) 
and the codeword x = (1:1:0, 1:1:0, 1:1:0, 0:0:0) £ C(T). □ 

Lemma 58 All vertices o/P(H) are vectors with rational entries. 

Proof: Remember that 'P(H) = njgy(H) conv(Cj(H)) is defined as the intersection of | J’(H)| 
polytopes. However, all polytopes conv(Cj(H)), j £ i7(H) can be defined with linear inequal¬ 
ities that involve only integer coefficients, cf. Lemmas 25 and 26. Therefore, also 'P(H) can 
be defined with linear inequalities that involve only integer coefficients. Now, any vertex of 
'P(H) is a point in 'P(H) where n inequalities hold with equality and where these n equalities 
form a system of linear equations with full rank. Using Cramer’s rule for solving systems of 
linear equations (see e.g. [71]) we see that indeed all vertices of 'P(H) are vectors with rational 
entries. □ 

A.2 Proof of Proposition 22 

Let Xi = Xiiyi) be defined as in (17). Let us first prove the following lemma. 

Lemma 59 


^GCD(H)^ ^ ^ min y^WjXj. 


®®Other choices for and can also yield n. 


(61) 
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Proof: Let us first rewrite the right-hand side of (30). Because is a constant 

for a given y, instead of maximizing (1/M) log PY|x(y^^l^) (30) we can also maximize 


1 ^Y|x(y^^lx) 


1 

M 


E E 

m£[M] 


^^i,7n |^i,m I 

^y,m|Xi,^(yi|o) 


1 

M 


E E i°s 

2GX mG[M] 


PYi\Xi{yi\^i,m) 

PvilXiiVil^) 




X. 


i,m^i 


iSl m£[M] 


{*) 


E [m ^ 

*ex \ me[M] / 

-'^u>i(x) ■ Aj, 
iex 


A* 


where at step (*) we used (4). With this we can extend (30) to read 

1 


=arg max — 
(M,T,X)6Q(H) M 


iogPY|x(y^""|s) 


= arg _min_ y ^ u:i(x) ■ Xj. 
(M,T,5)eQ(H) .g 2 : 


Remembering the relationship between Q(H) and Q(H) as defined in (5) and (6), respectively, 
we can write 


-GCD(H)(y) 


-GCD(H) 

X 


arg min 
weQ(H) 


y ^ ■ Aj, 

i£l 


which proves the lemma. □ 

Lemma 59 allows us now to prove Prop. 22. Using the convexity of P(H) and a result 
that we found in Prop. 10, namely that all vertices of P(H) are in Q(H), we can extend (61) 
to read 


^GCD(H)^ ^ ^ min y^LVjXi 
ceSfH) ^ 


arg min 
u:&P{H) 


y ^ Aj, 

i£l 


which proves the proposition. 

A.3 Proof of Lemma 24 

Let T be an M-cover of T(H) and let x G C(T). We know that Muix) G Z” and from 
Prop. 10 we know that cj(x) G P(H) n £)”■. Because /C(H) = conic(P(H)) we conclude that 
Muj(x) G /C(H). Therefore, Ma^(x) G /C(H) n Z”, which proves the first statement. 

Similar to the proof of Lemma 55, let us fix some j ^ J and let us associate the vectors 
x'^'^\ m G [M] to X. There it was shown that G Cj(H) for all m G [M]. Rewriting (60) 
to read Mcj(x) = see that Muj{x) G Cj (in F 2 ). Because j was arbitrary 
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and because C = njgj'Cj(H), we have Z{H.) C C (in F 2 ). Moreover, it is clear that -2^(H) D 
C (in F 2 )- Combining these two results proves the second statement. 

Let a; be a minimal pseudo-codeword and consider the half-ray given by {auj | a £ F_|_}. 
Because the fundamental cone /C(H) is the conic hull of the fundamental polytope 'P(H) 
we know that there is a non-zero vertex of the fundamental polytope lying on this half-ray. 
However, in Prop. 10 we have seen that all vertices of 'P(H) have rational entries and are 
therefore also in Q(H). Looking at one of the pre-images (M, T,x) £ Q(H) of this vertex we 
finally see that there must be an a £ IR++ such that a - u = x £ Z(Yi). This proves the third 
statement. 


A.4 Proof of Lemma 28 

Let us study the set Q(H) as defined in (6) and (7); Prop. 10, which shows a connection 
between Q(H) and 'P(H), will then give the desired result. (Note that we only discuss the 
case where T(H) is a tree. The case where T(H) is a forest, i.e. a collection of trees, is a 
straightforward extension.) 

So, let T be an M-cover T of T(H). Because T(H) is a tree it is easy to see that T is a 
collection of M disjoint trees that are copies of T(H). With suitable labeling of the vertices 
of T we have C(T) = {x £ F^^ | ■ ■ ■ ,Xn,m) £ C for all m £ [M]} and it follows that 

Q(H) = U cc;(C(f)) 

T: T is a finite-cover graph of T(H) 

equals conv(C) n £)"■. Using Prop. 10 we see that 'P(H) = Q(H) = conv(C) n = conv(C) 
as promised. 


A.5 Proof of Statements after Definition 33 

The first statement is proven as follows. Note that u £ [0,1]"'. Let T C T be set of positions 
were the channel bit flips happened. <S'|x=o is non-negative if and only if ^ 0 if 

and only if - uji + Yiei\e ^ 0 if and only if -2 Yi&E + Siex ^ Therefore, 
a necessary condition for 5|x=o to be non-positive is that \£\ ^ ^t(;fi.ac(^)- This follows by 
observing that \£\ ^ Yie£‘^i ^ k = |'»^frac(^<^)- 

The second statement follows by replacing cj by a;/ ||cj||^ in the above argument and by 
observing that ||cj llooe[0>ir- 


A.6 Proof of Lemma 39 

The expressions in the lemma are obtained doing the following manipulations: 


Wp{u:) = 


\(J^\ 


(E[P])2 




|5|.E[L!2] 


= 151 


E[p2] 


= I5| 


1 


E[02]_(E[n])2 

imir 


= | 5 | 


Var[n] 

(E[Q])2 


+ 1 


(62) 
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A.7 Proof of Lemma 40 

From vector analysis it is well known that (cj,1) = II1II 2 11 ^ 112 ^°® (Z(cj,l))- With this, we 
can write 


w. 


.(^) = 




\ijj 


cos {Z{u,l)y 


kj 


a; 


= n ■ cos {Z{u, 1 ))‘ 


(63) 


The proof of the second part of the lemma statement is analogous. 


A.8 Proof of Lemma 41 

We only consider the AWGNC pseudo-weight case, the other cases are left to the reader as an 
exercise. The proof for the AWGNC pseudo-weight case is done in two steps: first we prove 
a simplified statement (Lemma 60), then we prove the general case. 

Lemma 60 Consider the same setup as in Lemma 4L Assuming additionally that = 

1 for all I G [L] and that ai = 1 we have 
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(64) 


Proof: Let u = . Using the assumptions, it is easy to see that ||^'||;^ = 1. 

Moreover, 
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where step (*) follows from the Cauchy-Schwarz inequality. Concluding, 
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where step (*) follows from (66). 

Now we prove Lemma 41. For I G [L], let / ^X]re[i:/] 

||a;(^)||^. Note that ~ ^ ll^^lli = Ij £ [L]- Then 



□ 

and let 


/ N (*) 

Wp{u) = 


V 


Oil ktJ 


Eee\L] 


= Wr. 


E V 

te[L] ^^'6[L] 


mi 


a; 


mil 


= Wp 



(**) 

:> 


min ... ,Wp{u^^^)^ 


CJ 


(L 


CJ 


mi 


( 68 ) 

(69) 


where at step (*) we used the scaling-invariance of Wp{ ■) and at step (**) we used the above 
lemma and the fact that Wp{uj^^'>) = rcp(cj^) for i G [L]. 


A.9 Proof of Lemma 42 

Proof: Let u = . Note that the assumptions in the lemma statement imply that 

lli^ll^ = 1. The inequality follows then by using partial results of the proof of Lemma 60. 
Specifically, we use (65) which says that 



or, equivalently. 
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For = 1 and ||^||f^ = 1 we have ||^'||2 = Xj^Wp{y) and = 1 /\JWp{u:^^'^), 

respectively, and the result follows then immediately from the assumptions in the lemma 
statement and the above considerations. □ 


A. 10 Proof of Lemma 43 

Proof: We have 




d , , 

— Wp(u) = 
duJi dlVi 
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I'e n 






The lemma follows then by analyzing the expression in the square brackets. 


□ 


A. 11 Proof of Lemma 44 

Proof: For a; = 0 the statement is trivial. So, assume that co 0. Because we assume in the 
lemma that cj ^ 1 we must have maxjgj^] cuj ^ 1, which proves the first inequality in (48). 
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The second inequality in (48) follows upon observing that 

'f^max—frac(^) — 
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(maxig[„] Wi) (Eie[n] 

Sie[n] niaxj/g[,.^] Wj/) 


. .,.,2 

= Wp{u}. 



The third inequality in (48) can be proven as follows. Let 1^^ be the indicator vector of cj, 
i.e. the i-the position is 1 if oji is non-zero and it is 0 otherwise. Then, using the Cauchy- 
Schwarz inequality we see that ||cj||^ = (cj,l)^ = ^ ||a;|| 2 -||la )||2 = ll‘^ll 2 'l®'^PP(^)l = 

||cj ||2 • r(;p^*^(a;) and dividing by ||u ;||2 yields the desired expression. 

The inequalities in (50) follow from the inequalities in (48) by observing that tCp( •) and 
•) are scaling-invariant and therefore, when finding u;“™(H)( •) and r(;p^^’™“(H)( •), 
it is sufficient to minimize over the non-zero vertices of the fundamental polytope. 

The first inequality in (49) is the same as the first inequality in (48). In order to prove the 
second inequality in (49) consider the functions /(•) and F{ ■) and the value e in Def. 32. On 
the one hand, the area under /(•) from 0 to e equals (1/2) ■ F{n) = (1/2) • ||a;||^ by definition. 
On the other hand, because /(•) is non-increasing, the same area is upper bounded by 
® ■ Halloo- Solving for 2e we obtain 2e ^ ll^lli / Halloo- third inequality in (49) is 

obtained as follows. First, note that F(|supp(a;)|) = F{n). Secondly, consider the chord 
from (0, F(0) = 0) to (|supp(a;)|, F(|supp(cj)|) = ||u;||j^). Because F{-) is concave, the cord 
is always below T( •) in the domain of interest. Therefore, F{e), which by definition must be 
equal to (1/2) • ||<^||]^, is not smaller than (||a^||]^ /|supp(a;)|) • e. Combining these observations 
we obtain 2e ^ |supp(a;)|. 

The inequalities in (51) follow from the inequalities in (49) by observing that Wp{-) and 
■) are scaling-invariant and therefore, when finding u;™'“(H)( •) and u;p^*^’™'“^(H)( •), 
it is sufficient to minimize over the non-zero vertices of the fundamental polytope. □ 


A. 12 Proof of Lemma 48 

The expressions in (53) and (54) for the AWGNC pseudo-weight are an immediate consequence 
of Defs. 45 and 46. Our main task is therefore to show that m G /C(H). To that end, let us 
use the fundamental cone description of Lemma 26. It is obvious that u;* ^ 0 for all i G T(H). 
Now, consider a check node Bj at tier 2t -|- 1 for some t ^ 0 that is connected to variable 
nodes at tier 2t and possibly some variable nodes at tier 2t -|- 2. We distinguish two cases: 

• The check node Bj is connected to only one variable node, say Xi^ at tier 2t, and 

Wrow — 1 variable nodes, say Xi^,... at tier 2t + 2. From Def. 46 it follows 

that wq = l/{wrow — 1)* and that uji^ = • • • = = l/(rCrow — 1)*^^. It is easy to 

check that Wj/ ^ satisfied for all i' G Ij = {d, •••, *«;„*}• Indeed, the 

most crucial of them being for i' = ii where we have the inequality l/(u;row — 1)* ^ 
(rcrow — 1) • l/(^nrow ” 1)*^^ that is satisfied with equality. 

• The check node Bj is connected to at least two variable nodes, say Xi^ ■,■■■, Xi^ at tier 2t, 

and Wrow — h variable nodes, say ..., , at tier 2t -|- 2 where 2 ^ h ^ w^ow- 

From Def. 46 it follows that cuq = ■ ■ ■ Wi^ = l/{wrow — 1)* and that = ■■■ = 
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~ l/('*^row — 1)*"*'^. It is easy to check that Wj/ ^ Siex \{i'}satisfied for 
all i' G Xj = {ii,... ,fu)row}' Actually, unless rcrow = 2, none of them is satisfied with 
equality. 

Because the check node Bj was arbitrary, this concludes the proof that a; G /C(H). 


A. 13 Proof of Proposition 49 

Let T = T(H) be the Tanner graph corresponding to H. To prove the upper bound on 
we proceed as follows. By definition, the AWGNC pseudo-weight of any non-zero 
pseudo-codeword is larger than or equal to 'u;“™(H). Therefore, any upper bound on the 
pseudo-weight of any non-zero pseudo-codeword will yield an upper bound on r(;™“(H). 

Our choice for a non-zero pseudo-codeword is a pseudo-codeword that was obtained by 
the canonical completion rooted at an arbitrary variable node V, see Def. 46. Its AWGNC 
pseudo-weight was established in Lemma 48. To get an upper bound on Wp(u>), we need a 
lower bound on and an upper bound on ||u;||^. We start with the lower bound on ||ti^|| 2 - 

We have^® 

[5(T)/2J / , \ 2 0 / \ 2 

Hli= E >EiV,„(T)(i^) = 1 , (71) 

where we used A^\/,o(T) = 1. A side note: if we can assume that the girth ( 7 (T) of T is at least 
six, we have Ay^o(T) = A™q^(T) and Ay_ 2 (T) = A™|^(T), and therefore we get the better 
lower bound 


Li) 


||2 

II2 



2 


J 

k-1' 


For even larger girth, we could give even better lower bounds, but we will not pursue this any 
further. 

Now we turn to the problem of obtaining an upper bound on II cj 11^ = ■ 

Because Ny^ 2 t{J) ^ for all t ^ 0, this sum is clearly upper bounded by the same sum 

for a Tanner graph which has the same number of variable nodes but which has maximal 
expansion, i.e.. 


L<5(T)/2J t' 

t =0 ^ > t=0 ^ ' 

where we introduced Ay 24 — ^V 2 t 0 ^ t ^ t' where t' is some constant such that 
Yl\=o ^v, 2 t < n = Av/, 2 t(T) < Yft=o^v, 2 f construction, t' will fulfill t' ^ 

order to shorten the the notation used in this proof we will use j = Wcoi and k = Wrow. 
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[(5(T)/2j. Continuing, 
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Combining (71) and (72) we obtain 
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u;”“(H) ^ u;p(c^) = 
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(73) 


In order to complete the proof, we need an upper bound (in function of the code size n) on 
t'. Remembering the dehnition of t', such a bound can be obtained as follows: 
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where 7 ^^^ = (j — 1)(^ — !)• Therefore, 
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For k > j we have /3(j, A:) < 1. 


PU,k)^2 


log(j - 1) _ log ((j - 1)^) 
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A. 14 Proof of Proposition 51 

Let us prove the first statement. Because cj is in the fundamental polytope P(H), it is also 
in the fundamental cone /C(H) and so, for each j G 77 and for each z' G Ij it fulfills (see 
Def. 27): YlieJj\{i '}^ This means that for all j G 77, if there is an i'j G Tj such that 
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Wj/ > 0 then there are at least two distinct such that Wj/, > 0 and Wj" > 0. But this 

is equivalent to the condition that each check node in 9(supp(a^)) is connected to at least two 
variable nodes in supp(a;). 

Let us now prove the second statement. Let 5 be a stopping set and let G IR” be a 
vector where i/j = 1 if i G 5 and Vi = Q otherwise. It can easily be seen that this vector 
fulfills all the conditions for being in the fundamental cone /C(H), using e.g. the inequalities 
in Lemma 26. Following the comment after Def. 23, there is an a G IR++ (in fact, a whole 
interval of a’s) such that u = av is in the fundamental polytope 'P(H). 

A. 15 Proof of Lemma 52 

It follows from the definition of the fundamental polytope (Def. 8) and the discussion before 
and after (21) that conv(C) C P(H') C P(H). However, using Lemma 28 we can conclude 
that 'P(H) = conv(C) which proves that P(H) = P(H') as desired. 

An alternative proof would be to show that (under the conditions in the lemma statement) 
ll> G ”P(H) implies ll> G 'P(H^) where for 'P(H) and 'P(H^) we use the description given in 
Lemma 26. Some manipulations of the involved inequalities lead to the desired result. We 
leave the details to the reader. 


A. 16 Proof of Corollary 53 

Let Hi be the matrix that contains the rows of H that are not included in Hi. We have 


P(H)=P(Hi)nP(Hi), 

Hi 

,a H 


P(H') =V 


Using Lemma 52 we conclude that V 
equals P(H). 


Hi 
a H 


nP(Hi). 

equals P(Hi) and that therefore P(H') 


A. 17 Proof of Corollary 54 

Let A have L rows, let a^, £ G [L], be the vector containing the Uth row of A, and let H^, 
£ G L, be the matrix that contains the rows of H that are not included in H^. We have 

P(H)= f| (P(H,)nP(H,)), 

1&[L] 

nH')= n (p((b“'h))^^to)' 


Using Lemma 52 we conclude that V 
therefore 'P(H') equals 'P(H). 


a^ H 


equals ^(H^) for all i G [L\ and that 
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